Unified MLOps: Feature Stores & Model Deployment

Uniﬁed MLOps:
Feature Stores and
Model Deployment
Monte Zweben- CEO @ Splice Machine
Jack Ploshnick- Data Scientist @ Splice Machine

Agenda
● Goals of production machine learning
● Why are these goals hard to achieve?
● What is a Feature Store
● Feature Store Landscape
● Database Deployment & Feature Stores

Real-Time Machine Learning Components
Scale-Out Operational
Data Platform
Feature Store
Re-usability, Governance, Serving
Model Deployment
Modeling Experimentation
Scale-Out Analytical Data Platform

Real-Time Machine Learning Components
Scale-Out Operational
Data Platform
Feature Store
Re-usability, Governance, Serving
Model Deployment
Modeling Experimentation
Scale-Out Analytical Data Platform
ML Landscape Today

Typical Machine Learning Infrastructure
Bespoke pipelines
Data Warehouse
Database
Real-Time Data
Model 1
Dashboard
Model 2

Pipeline Duplication is Not Enough
Higher Compute Costs Recreating Features
Lost Signal Data Lineage Nightmare

What is a Feature Store?
Real-Time Data Batch Data
Feature
Store
Feature
Search
Training Sets
Feature
Serving
Governance

Machine Learning with a Feature Store
Feature Store
Model 1
Data Warehouse
Database
Real-Time Data
Dashboard
Model 2

Feature Store Requirements
● Scales > 1B records
● Scales > 20K features
● Feature vector retrieval by primary key for inference <5ms-10ms
● Point-in-time consistency on training data
● Event-driven feature updates
● Batch feature updates
● Track feature lineage
● Discoverability and reuse with feature metadata
● Feature lineage
● Backfill of new features

Existing Architectures
Raw Data
Streaming
(KV store)
Batch
(Analytics Engine)
Feature
Store
Consumer

Alternative Approach- HTAP Database
Feature
Serving

Challenges of HTAP Databases
● In Memory
● Custom Hardware
● No support for secondary indexes or triggers
● Not ACID compliant

Splice Machine
● Scale-out
● Any Cloud/On-Prem
● Indexes and Triggers
● Full ACID Compliance

Feature Set Implementation
Feature Set Pipeline
INSERT / UPDATE
Initial
Backfill

Scalable & Persistent Storage of Predictions
● Easily track data drift
● Easily track concept drift
● Compare new models to history
● Fully audit-proof history

Database Deployment - Evaluation Store
Prediction made and populated at millisecond speed

HTAP Database: Feature Store + Deployment

Predictions Models Features Data
Which model made that
prediction?
Which algorithm,
parameters, and features
were used to train the
model?
How were the features
computed?
What was the raw data at
the time of training?
Splice Machine
Database Deployment Feature Store
Guaranteed Lineage and Governance

Unified MLOps: Feature Stores & Model Deployment

More Related Content

What's hot

Similar to Unified MLOps: Feature Stores & Model Deployment

More from Databricks

Recently uploaded

Unified MLOps: Feature Stores & Model Deployment