Unified MLOps:
Feature Stores and
Model Deployment
Monte Zweben- CEO @ Splice Machine
Jack Ploshnick- Data Scientist @ Splice Machine
Agenda
● Goals of production machine learning
● Why are these goals hard to achieve?
● What is a Feature Store
● Feature Store Landscape
● Database Deployment & Feature Stores
Real-Time Machine Learning Components
Scale-Out Operational
Data Platform
Feature Store
Re-usability, Governance, Serving
Model Deployment
Modeling Experimentation
Scale-Out Analytical Data Platform
Real-Time Machine Learning Components
Scale-Out Operational
Data Platform
Feature Store
Re-usability, Governance, Serving
Model Deployment
Modeling Experimentation
Scale-Out Analytical Data Platform
ML Landscape Today
Typical Machine Learning Infrastructure
Bespoke pipelines
Data Warehouse
Database
Real-Time Data
Model 1
Dashboard
Model 2
Pipeline Duplication is Not Enough
Higher Compute Costs Recreating Features
Lost Signal Data Lineage Nightmare
Feature Store
What is a Feature Store?
Real-Time Data Batch Data
Feature
Store
Feature
Search
Training Sets
Feature
Serving
Governance
Machine Learning with a Feature Store
Feature Store
Model 1
Data Warehouse
Database
Real-Time Data
Dashboard
Model 2
Feature Store Requirements
● Scales > 1B records
● Scales > 20K features
● Feature vector retrieval by primary key for inference <5ms-10ms
● Point-in-time consistency on training data
● Event-driven feature updates
● Batch feature updates
● Track feature lineage
● Discoverability and reuse with feature metadata
● Feature lineage
● Backfill of new features
Existing Architectures
Raw Data
Streaming
(KV store)
Batch
(Analytics Engine)
Feature
Store
Consumer
Existing Architectures
Raw Data
Streaming
(KV store)
Batch
(Analytics Engine)
Feature
Store
Consumer
Alternative Approach- HTAP Database
Feature
Serving
Challenges of HTAP Databases
● In Memory
● Custom Hardware
● No support for secondary indexes or triggers
● Not ACID compliant
Splice Machine
● Scale-out
● Any Cloud/On-Prem
● Indexes and Triggers
● Full ACID Compliance
Feature Set Implementation
Feature Set Pipeline
INSERT / UPDATE
Initial
Backfill
Intuitive API
Model Deployment
Scalable & Persistent Storage of Predictions
● Easily track data drift
● Easily track concept drift
● Compare new models to history
● Fully audit-proof history
Database Deployment - Evaluation Store
Prediction made and populated at millisecond speed
HTAP Database: Feature Store + Deployment
Predictions Models Features Data
Which model made that
prediction?
Which algorithm,
parameters, and features
were used to train the
model?
How were the features
computed?
What was the raw data at
the time of training?
Splice Machine
Database Deployment Feature Store
Guaranteed Lineage and Governance
Questions?

Unified MLOps: Feature Stores & Model Deployment

  • 1.
    Unified MLOps: Feature Storesand Model Deployment Monte Zweben- CEO @ Splice Machine Jack Ploshnick- Data Scientist @ Splice Machine
  • 2.
    Agenda ● Goals ofproduction machine learning ● Why are these goals hard to achieve? ● What is a Feature Store ● Feature Store Landscape ● Database Deployment & Feature Stores
  • 3.
    Real-Time Machine LearningComponents Scale-Out Operational Data Platform Feature Store Re-usability, Governance, Serving Model Deployment Modeling Experimentation Scale-Out Analytical Data Platform
  • 4.
    Real-Time Machine LearningComponents Scale-Out Operational Data Platform Feature Store Re-usability, Governance, Serving Model Deployment Modeling Experimentation Scale-Out Analytical Data Platform ML Landscape Today
  • 5.
    Typical Machine LearningInfrastructure Bespoke pipelines Data Warehouse Database Real-Time Data Model 1 Dashboard Model 2
  • 6.
    Pipeline Duplication isNot Enough Higher Compute Costs Recreating Features Lost Signal Data Lineage Nightmare
  • 7.
  • 8.
    What is aFeature Store? Real-Time Data Batch Data Feature Store Feature Search Training Sets Feature Serving Governance
  • 9.
    Machine Learning witha Feature Store Feature Store Model 1 Data Warehouse Database Real-Time Data Dashboard Model 2
  • 10.
    Feature Store Requirements ●Scales > 1B records ● Scales > 20K features ● Feature vector retrieval by primary key for inference <5ms-10ms ● Point-in-time consistency on training data ● Event-driven feature updates ● Batch feature updates ● Track feature lineage ● Discoverability and reuse with feature metadata ● Feature lineage ● Backfill of new features
  • 11.
    Existing Architectures Raw Data Streaming (KVstore) Batch (Analytics Engine) Feature Store Consumer
  • 12.
    Existing Architectures Raw Data Streaming (KVstore) Batch (Analytics Engine) Feature Store Consumer
  • 13.
    Alternative Approach- HTAPDatabase Feature Serving
  • 14.
    Challenges of HTAPDatabases ● In Memory ● Custom Hardware ● No support for secondary indexes or triggers ● Not ACID compliant
  • 15.
    Splice Machine ● Scale-out ●Any Cloud/On-Prem ● Indexes and Triggers ● Full ACID Compliance
  • 16.
    Feature Set Implementation FeatureSet Pipeline INSERT / UPDATE Initial Backfill
  • 17.
  • 18.
  • 19.
    Scalable & PersistentStorage of Predictions ● Easily track data drift ● Easily track concept drift ● Compare new models to history ● Fully audit-proof history
  • 20.
    Database Deployment -Evaluation Store Prediction made and populated at millisecond speed
  • 21.
    HTAP Database: FeatureStore + Deployment
  • 22.
    Predictions Models FeaturesData Which model made that prediction? Which algorithm, parameters, and features were used to train the model? How were the features computed? What was the raw data at the time of training? Splice Machine Database Deployment Feature Store Guaranteed Lineage and Governance
  • 23.