Ml infra at an early stage

ML Infra at an
Early-Stage
Feature Services
Nick Handel, Head of Data Science
March 2019

- Many people
“Machine Learning is
99% Infrastructure”

Unfortunately the Infra
is really hard...

It will look
something like this

Big tech companies are
building incredible
infrastructure

8Source: Hidden Technical Debt in Machine Learning Systems

9
Source: Meet Michelangelo: Uber’s Machine Learning Platform

10Source: Bighead - Airbnb’s End-to-End Machine Learning Platform

11
What about the rest of us?
● Public solutions are lagging
○ Big Cloud providers aren’t providing end-to-end solutions
○ There is no enterprise solution that goes end-to-end
○ There is no widely-adopted open source solution
● The option set for the rest of us:
○ Buy pieces and combine
■ Requires engineering and money
■ Some pieces of infra didn’t have solutions: feature stores
○ Build
■ Requires engineering and may lead to tech debt with scale

12
Data is at the center of ML Infra
Connect to a range of
data sources
Monitor raw and
transformed data,
Monitor for feature drift
Collect and transform
features for testing
new model ideas
Share model outputs
as features in other
models
Cache production features
for training and validation
of point in time
correctness
Transform data
consistently between
inference and training
Backfill historical features
to test new ideas offline
(Not easy)Validate raw and
transformed data (types,
ranges, etc.)
Extract
Data
Build
Features
Train
Models
Monitor
Models
Serve
Models
Collect features for many
subjects (users, devices,
markets, etc.)
(duh)

1. Start basic
2. Build (or buy) a Feature Service
3. Mature the pieces that are
important to your business

Simple Definition: Service for computing, and managing ML Data
In order of importance…
1. Framework
○ Reusable code
○ Consistency
○ Ease of development
2. Computation Engine
○ Service that builds features
○ Backfills new features for old inferences
3. Cache
○ Stores derived features
15
Defining a Feature Service

Feature
Repository
DynamoDB
Architecture
Write
Read
Inference
Training
Development
Feature
Service
Flask App

Write
Read
And for training
17
Life of a Feature
Inference Training Training
Model
Iteration
Feature
Iteration
Feature Repository
DynamoDB
Feature
Iteration
Validate point in time
correctness by
running training path
on previously
computed features
Calculate
and cache
features in
production
Use cached
features for
model
development
And for testing
new features
Calculate
features in
production
Train with new
features and
save them to
the cache

Flexible methods for
merge, join and concat
Everything is built on ABCs with
automated testing
As flexible as Python
Custom one-off
transforms
Features are built on versioned
extracts and transforms
Chain of
transformations
Multiple Features from
a single extract
Feature Definition

Defining Features
● Python is approachable and fast enough for our
inference needs (<10s)
● Keeps it simple
Versions
● Easy to manage at our stage
● Consistent transforms
● Different versions for different models
Transforms
● Reuseable!
● Organized: Filter, Map, Reduce
Testing
● Code works
● Production models don’t break
Feature Definition

Validate input and
output data of features Store transformed
features at the point of
inference for records
Track metrics on
features and monitor
for drift
20
Where we are today
Extract
Data
Build
Features
Train
Models
Monitor
Models
Serve
Models
Common Feature
Transformation Code
Features
accessible by
SQL
Backfill historical
features at specific
points in time (100%!!)
Enable Training on much
larger datasets with
previously computed features
Share model outputs as
features in other models
(learned features)

Prediction:
Feature stores will be the
centerpiece of everyone's ML
Infra in 3 years

The Team
Dave Bernthal
Dennis Van Der Staay
Spencer Barton
Ting Ting Liu

Thank You!
Nick Handel
nick@branch.co
@nick_handel

25
Branch’s ML Problem
● Long Feedback Signals
○ Problem: We make loans and get signal back between 28 and 1 year
later
○ Solution: Make it possible to reconstruct
● Feature Drift
○ Problem: The way people use their mobile phones in developing
markets changes constantly
○ Solution: Store features and adjust for feature drift
● Many data sources and types
○ Problem: We collect data from a variety of sources and types (raw
text, network data, event streams, location, etc.)
○ Solutions: Build a system for feature construction that unifies
pipelines from different sources and types of transformations

● Learned Features
○ Model Storage is easy
○ Model Serving isn’t trivial
● Monitoring
○ Concept drift is one of our primary ML challenges
● Auto ML
○ Input labels and output model for production…
○ You already have the features!
26
What’s Next

Ml infra at an early stage

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ml infra at an early stage

Similar to Ml infra at an early stage (20)

Recently uploaded

Recently uploaded (20)

Ml infra at an early stage