Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Unified MLOps: Feature Stores & Model Deployment

Download to read offline

If you’ve brought two or more ML models into production, you know the struggle that comes from managing multiple data sets, feature engineering pipelines, and models. This talk will propose a whole new approach to MLOps that allows you to successfully scale your models, without increasing latency, by merging a database, a feature store, and machine learning.

Splice Machine is a hybrid (HTAP) database built upon HBase and Spark. The database powers a one of a kind single-engine feature store, as well as the deployment of ML models as tables inside the database. A simple JDBC connection means Splice Machine can be used with any model ops environment, such as Databricks.

The HBase side allows us to serve features to deployed ML models, and generate ML predictions, in milliseconds. Our unique Spark engine allows us to generate complex training sets, as well as ML predictions on petabytes of data.

In this talk, Monte will discuss how his experience running the AI lab at NASA, and as CEO of Red Pepper, Blue Martini Software and Rocket Fuel, led him to create Splice Machine. Jack will give a quick demonstration of how it all works.

Unified MLOps: Feature Stores & Model Deployment

  1. 1. Unified MLOps: Feature Stores and Model Deployment Monte Zweben- CEO @ Splice Machine Jack Ploshnick- Data Scientist @ Splice Machine
  2. 2. Agenda ● Goals of production machine learning ● Why are these goals hard to achieve? ● What is a Feature Store ● Feature Store Landscape ● Database Deployment & Feature Stores
  3. 3. Real-Time Machine Learning Components Scale-Out Operational Data Platform Feature Store Re-usability, Governance, Serving Model Deployment Modeling Experimentation Scale-Out Analytical Data Platform
  4. 4. Real-Time Machine Learning Components Scale-Out Operational Data Platform Feature Store Re-usability, Governance, Serving Model Deployment Modeling Experimentation Scale-Out Analytical Data Platform ML Landscape Today
  5. 5. Typical Machine Learning Infrastructure Bespoke pipelines Data Warehouse Database Real-Time Data Model 1 Dashboard Model 2
  6. 6. Pipeline Duplication is Not Enough Higher Compute Costs Recreating Features Lost Signal Data Lineage Nightmare
  7. 7. Feature Store
  8. 8. What is a Feature Store? Real-Time Data Batch Data Feature Store Feature Search Training Sets Feature Serving Governance
  9. 9. Machine Learning with a Feature Store Feature Store Model 1 Data Warehouse Database Real-Time Data Dashboard Model 2
  10. 10. Feature Store Requirements ● Scales > 1B records ● Scales > 20K features ● Feature vector retrieval by primary key for inference <5ms-10ms ● Point-in-time consistency on training data ● Event-driven feature updates ● Batch feature updates ● Track feature lineage ● Discoverability and reuse with feature metadata ● Feature lineage ● Backfill of new features
  11. 11. Existing Architectures Raw Data Streaming (KV store) Batch (Analytics Engine) Feature Store Consumer
  12. 12. Existing Architectures Raw Data Streaming (KV store) Batch (Analytics Engine) Feature Store Consumer
  13. 13. Alternative Approach- HTAP Database Feature Serving
  14. 14. Challenges of HTAP Databases ● In Memory ● Custom Hardware ● No support for secondary indexes or triggers ● Not ACID compliant
  15. 15. Splice Machine ● Scale-out ● Any Cloud/On-Prem ● Indexes and Triggers ● Full ACID Compliance
  16. 16. Feature Set Implementation Feature Set Pipeline INSERT / UPDATE Initial Backfill
  17. 17. Intuitive API
  18. 18. Model Deployment
  19. 19. Scalable & Persistent Storage of Predictions ● Easily track data drift ● Easily track concept drift ● Compare new models to history ● Fully audit-proof history
  20. 20. Database Deployment - Evaluation Store Prediction made and populated at millisecond speed
  21. 21. HTAP Database: Feature Store + Deployment
  22. 22. Predictions Models Features Data Which model made that prediction? Which algorithm, parameters, and features were used to train the model? How were the features computed? What was the raw data at the time of training? Splice Machine Database Deployment Feature Store Guaranteed Lineage and Governance
  23. 23. Questions?
  • saurabhverma2412

    Jul. 24, 2021

If you’ve brought two or more ML models into production, you know the struggle that comes from managing multiple data sets, feature engineering pipelines, and models. This talk will propose a whole new approach to MLOps that allows you to successfully scale your models, without increasing latency, by merging a database, a feature store, and machine learning. Splice Machine is a hybrid (HTAP) database built upon HBase and Spark. The database powers a one of a kind single-engine feature store, as well as the deployment of ML models as tables inside the database. A simple JDBC connection means Splice Machine can be used with any model ops environment, such as Databricks. The HBase side allows us to serve features to deployed ML models, and generate ML predictions, in milliseconds. Our unique Spark engine allows us to generate complex training sets, as well as ML predictions on petabytes of data. In this talk, Monte will discuss how his experience running the AI lab at NASA, and as CEO of Red Pepper, Blue Martini Software and Rocket Fuel, led him to create Splice Machine. Jack will give a quick demonstration of how it all works.

Views

Total views

147

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

16

Shares

0

Comments

0

Likes

1

×