Build ML Systems with just Python
and free Serverless Services
Jim Dowling – CEO and Co-Founder @ Hopsworks
❌ Static Datasets
❌ Data is downloaded from a single URL
❌ Features for ML are correct and unbiased
❌ Use a model evaluation metric (accuracy) to communicate the value of your model
💚 Data never stops coming
💚 Data comes from different heterogeneous data sources
💚 Write code to extract and validate features from input data
💚 Integrate your model with a UI or app/service
💚 Follow MLOps best practices for operating ML Services (versioning, testing)
2
Production Machine Learning is hard
Enterprise Data and Enterprise AI Infrastructure is Complex and Big
Model Serving
MySQL /
Postgres
KV Store
Object/GraphDB
Elastic
Operational DBs
Snowflake / Databricks / BQ
Hopsworks /
Feast / etc
Feature Store
Images, Docs, Parquet, etc
S3 /ADLS / GCS
ETL
Bronze Sliver Gold
Data Lake
Data Warehouse
Kafka Streaming Feature
Pipelines
ETL
Feature
Pipelines
ELT or
ETL
Model
Registry
Batch
Inference
Pipelines
Online
Inference
Pipelines
Predictions
Online
App
Training
Pipelines
Analytical Data Stores ML Infrastructure
There has to be an easier way to
build production ML Systems….
Show love with a star!
www.serverless-ml.org
https://github.com/featurestoreorg/serverless-ml-course ⭐
Serverless infrastructure, from serverless databases to
lambda functions, enables developers to build such
operational and analytical services without having to
install, configure, and operate the infrastructure that
powers them.
Serverless infrastructure is often synonymous with
cloud infrastructure
■ The cloud makes it easier to build services that are
highly available, elastic (scale up and down
resource usage in response to demand), can be
upgraded, backed up and restored, and are
pay-per-use.
5
Yes, with Serverless Machine Learning
Serverless Machine Learning
Requirements &
Sprint Tasks
Working ML
System
Validate
Model/Features
Validate UI / API
iterate
iterate
Serverless ML should improve both iteration speed and quality when developing ML products
by enabling seamless integration of best practices from MLOps.
Serverless infrastructure enables developers to build such operational and analytical
ML systems without having to operate the infrastructure that powers them.
Any ML System can be composed as a set of Feature/Training/Inference Pipelines
Feature
Pipelines
Inference
Pipeline
Training
Pipeline
Feature & Model Store
Prediction
Service/UI
features
& Labels
features
& Labels
model
Batch/Streaming Batch/Online
On-Demand
features
& model
New Data
Historical Data
Logs
(predictions)
monitor
Monitoring
Why add a Feature Store & Model Registry to the Serverless ML Stack?
To Manage State
● Stores features, models, and files so that they can be passed between pipelines
For Online Models
● History and Context for Online Models - needed by Stateless Applications
● Eliminate Training/Inference Skew
For Batch Models
● Central repository to share, discover and use versioned, secure features
● Decouple feature engineering and batch inference pipelines
Feature Pipelines
A feature pipeline is a program that orchestrates the execution of a dataflow
graph of model-independent data transformations to prepare features.
V A C
Input Data Feature(s)
VACE = Validate, Aggregate, Compress (dimensionality reduction), Extract (Binning, Crosses, etc)
Feature
Store
feature-pipeline.py
E
Training Pipelines
C
Feature(s) Feature
Store
E
A training pipeline is a program that takes input features, applies
model-specific transformations, and then trains a model. During development,
there may be additional hyperparameter tuning and model architecture search
steps. The output is a model that is stored in a model registry.
H A T
Untransformed
Input Features
Model
T-HATE = Transform features, Hyperparameter tuning, model Architecture, Train model (fit to data), Evaluate your model.
E
T
training-pipeline.py
Feature
Store
Model
Registry
Batch Inference Pipelines
A batch inference pipeline is a program that takes input features and a model,
performs model-specific transformations, makes predictions using the model
and the transformed features, and writes the output predictions to some sink.
P O
Untransformed
Input Features
Predictions
TPO = Transform features, Predict, Output.
T
batch-inference-pipeline.py
Feature
Store
Prediction
Results
Model
Registry
model
Online Inference Pipelines
Results
T P P
Request
predictions
Feature
Store
Application
precomputed
untransformed features
transformed
features
TPP = Transform the input request into features, Predict using input features and the
model, Post-process predictions, before output results.
Model
Registry
model
predictor.py
transformer.py
model-deployment
An online inference pipeline is a program that takes input from a client app, retrieves features
from a feature store, transforms them, and then makes predictions with the features and the
model, finally returning the predictions to the client.
Today’s workshop is to build
Iris
as a Serverless ML System
Iris Flower, blue and yellow, ultra-wide-angle
created with Midjourney
Course Material: Prof Jim Dowling
Lab 1:
Iris Flowers as a
Serverless ML System
● Course Repository on Github
https://github.com/featurestoreorg/serverless-ml-course/
● Today’s workshop source code is here:
https://github.com/ID2223KTH/id2223kth.github.io/tree/master/src/serverless-
ml-intro
● Use Conda or virtual environments to manage your python dependencies on
your laptop
Source Code for Workshop
Iris as a serverless Operational ML system for Predictive Analytics
iris-feature-pipi
peline.py
iris/
app.py
iris-training-pip
ipeline.py
Iris UI
Iris Flower Data
iris.csv
features, labels features, labels model
model
Iris as a serverless Analytical ML system
iris-feature-pipi
peline-daily.py
iris-monitor/
app.py
iris-training-pip
ipeline.py
Iris Dashboard
& monitoring
Iris Flower Data
iris.csv
features, labels features, labels model
model
Synthetic Iris
Flower Data
Logs (predictions)
● Case Study: Iris Flower Dataset
● First Steps
a. Create a free account on hopsworks.ai
b. Create a free account on modal.com
c. Create a free account on huggingface.com
● Tasks
a. Build and run a feature pipeline on Modal
b. Build and run a training pipeline on Modal
c. Build and run an inference pipeline with a Gradio UI on Hugging Face
Spaces.
What will we cover in this lab
1. First, create an account on
https://app.hopsworks.ai
Register and Login to the Hopsworks Feature Store
19
2. Click on “User Settings”
3. Create and Save an “API Key”
Register to Modal and Set up HOPSWORKS_API_KEY environment variable
20
Add HOPSWORKS_API_KEY as a Environment
variable secret
Create an account on Modal
(might need some time to be approved)
Register and Create a Hugging Face Space
2
1. Create an account
on Hugging Face
2. Create a “Space”
3. Create a Gradio App with the name Iris inside
your account
Add a HOPSWORKS_API_KEY as a secret in your “iris” Space
22
1. Add your HOPSWORKS_API_KEY as a Repo Secret
Serverless ML with
Iris Flower Dataset
Iris Flower Dataset
Tabular Data
Features
● sepal length
● sepal width
● petal length
● petal width
Target (label)
● variety
This column is the
Pandas Index
Prediction Problem:
Predict the variety, given
the length and width of the
petal and sepal.
[Image from https://scikit-learn.org/stable/modules/neighbors.html#classification]
As we can see here two
features (sepal_length and
sepal_width) is not enough
features to separate the three
different varieties (setosa,
versicolor, virginica).
Classify Iris Flowers with K-Nearest Neighbors
Communicate the value of your model with a UI (Gradio)
● Communicate the value of your
model to stakeholders with an
app/service that uses the ML
model to make value-added
decisions
● Here, we design a UI in Python
with Gradio
○ Enables “predictive analytics”
where a user can use the
model to as “what-if” i had an
Iris Flower with this sepal/petal
width/length?
Run the Feature, Training, Online/Batch Inference Pipelines
iris-feature-pipeline.py
app.hopsworks.ai
iris/app.py
iris-training-pipeline.py
iris_model
features, labels iris_model
features, labels
iris-batch-inference-pipeline.py
features
predictions
iris-monitoring/app.py
logs
Serverless Surf Height Prediction: Github Actions/Pages
surf-report-features.ipynb
swell-features.ipynb
batch-predict-surf.ipynb
Github
Pages
Feature
Store
Lahinch, NOAA
Model
Registry
download
model
latest_lahinch.png
insert
DataFrames
https://github.com/jimdowling/cjsurf
train-model.ipynb
add model
SERVERLESS COMPUTE SERVERLESS STATE SERVERLESS UI
SERVERLESS ML
Show love with a star!
www.serverless-ml.org
https://github.com/featurestoreorg/serverless-ml-course ⭐

PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf

  • 1.
    Build ML Systemswith just Python and free Serverless Services Jim Dowling – CEO and Co-Founder @ Hopsworks
  • 2.
    ❌ Static Datasets ❌Data is downloaded from a single URL ❌ Features for ML are correct and unbiased ❌ Use a model evaluation metric (accuracy) to communicate the value of your model 💚 Data never stops coming 💚 Data comes from different heterogeneous data sources 💚 Write code to extract and validate features from input data 💚 Integrate your model with a UI or app/service 💚 Follow MLOps best practices for operating ML Services (versioning, testing) 2 Production Machine Learning is hard
  • 3.
    Enterprise Data andEnterprise AI Infrastructure is Complex and Big Model Serving MySQL / Postgres KV Store Object/GraphDB Elastic Operational DBs Snowflake / Databricks / BQ Hopsworks / Feast / etc Feature Store Images, Docs, Parquet, etc S3 /ADLS / GCS ETL Bronze Sliver Gold Data Lake Data Warehouse Kafka Streaming Feature Pipelines ETL Feature Pipelines ELT or ETL Model Registry Batch Inference Pipelines Online Inference Pipelines Predictions Online App Training Pipelines Analytical Data Stores ML Infrastructure
  • 4.
    There has tobe an easier way to build production ML Systems…. Show love with a star! www.serverless-ml.org https://github.com/featurestoreorg/serverless-ml-course ⭐
  • 5.
    Serverless infrastructure, fromserverless databases to lambda functions, enables developers to build such operational and analytical services without having to install, configure, and operate the infrastructure that powers them. Serverless infrastructure is often synonymous with cloud infrastructure ■ The cloud makes it easier to build services that are highly available, elastic (scale up and down resource usage in response to demand), can be upgraded, backed up and restored, and are pay-per-use. 5 Yes, with Serverless Machine Learning
  • 6.
    Serverless Machine Learning Requirements& Sprint Tasks Working ML System Validate Model/Features Validate UI / API iterate iterate Serverless ML should improve both iteration speed and quality when developing ML products by enabling seamless integration of best practices from MLOps. Serverless infrastructure enables developers to build such operational and analytical ML systems without having to operate the infrastructure that powers them.
  • 7.
    Any ML Systemcan be composed as a set of Feature/Training/Inference Pipelines Feature Pipelines Inference Pipeline Training Pipeline Feature & Model Store Prediction Service/UI features & Labels features & Labels model Batch/Streaming Batch/Online On-Demand features & model New Data Historical Data Logs (predictions) monitor Monitoring
  • 8.
    Why add aFeature Store & Model Registry to the Serverless ML Stack? To Manage State ● Stores features, models, and files so that they can be passed between pipelines For Online Models ● History and Context for Online Models - needed by Stateless Applications ● Eliminate Training/Inference Skew For Batch Models ● Central repository to share, discover and use versioned, secure features ● Decouple feature engineering and batch inference pipelines
  • 9.
    Feature Pipelines A featurepipeline is a program that orchestrates the execution of a dataflow graph of model-independent data transformations to prepare features. V A C Input Data Feature(s) VACE = Validate, Aggregate, Compress (dimensionality reduction), Extract (Binning, Crosses, etc) Feature Store feature-pipeline.py E
  • 10.
    Training Pipelines C Feature(s) Feature Store E Atraining pipeline is a program that takes input features, applies model-specific transformations, and then trains a model. During development, there may be additional hyperparameter tuning and model architecture search steps. The output is a model that is stored in a model registry. H A T Untransformed Input Features Model T-HATE = Transform features, Hyperparameter tuning, model Architecture, Train model (fit to data), Evaluate your model. E T training-pipeline.py Feature Store Model Registry
  • 11.
    Batch Inference Pipelines Abatch inference pipeline is a program that takes input features and a model, performs model-specific transformations, makes predictions using the model and the transformed features, and writes the output predictions to some sink. P O Untransformed Input Features Predictions TPO = Transform features, Predict, Output. T batch-inference-pipeline.py Feature Store Prediction Results Model Registry model
  • 12.
    Online Inference Pipelines Results TP P Request predictions Feature Store Application precomputed untransformed features transformed features TPP = Transform the input request into features, Predict using input features and the model, Post-process predictions, before output results. Model Registry model predictor.py transformer.py model-deployment An online inference pipeline is a program that takes input from a client app, retrieves features from a feature store, transforms them, and then makes predictions with the features and the model, finally returning the predictions to the client.
  • 13.
    Today’s workshop isto build Iris as a Serverless ML System
  • 14.
    Iris Flower, blueand yellow, ultra-wide-angle created with Midjourney Course Material: Prof Jim Dowling Lab 1: Iris Flowers as a Serverless ML System
  • 15.
    ● Course Repositoryon Github https://github.com/featurestoreorg/serverless-ml-course/ ● Today’s workshop source code is here: https://github.com/ID2223KTH/id2223kth.github.io/tree/master/src/serverless- ml-intro ● Use Conda or virtual environments to manage your python dependencies on your laptop Source Code for Workshop
  • 16.
    Iris as aserverless Operational ML system for Predictive Analytics iris-feature-pipi peline.py iris/ app.py iris-training-pip ipeline.py Iris UI Iris Flower Data iris.csv features, labels features, labels model model
  • 17.
    Iris as aserverless Analytical ML system iris-feature-pipi peline-daily.py iris-monitor/ app.py iris-training-pip ipeline.py Iris Dashboard & monitoring Iris Flower Data iris.csv features, labels features, labels model model Synthetic Iris Flower Data Logs (predictions)
  • 18.
    ● Case Study:Iris Flower Dataset ● First Steps a. Create a free account on hopsworks.ai b. Create a free account on modal.com c. Create a free account on huggingface.com ● Tasks a. Build and run a feature pipeline on Modal b. Build and run a training pipeline on Modal c. Build and run an inference pipeline with a Gradio UI on Hugging Face Spaces. What will we cover in this lab
  • 19.
    1. First, createan account on https://app.hopsworks.ai Register and Login to the Hopsworks Feature Store 19 2. Click on “User Settings” 3. Create and Save an “API Key”
  • 20.
    Register to Modaland Set up HOPSWORKS_API_KEY environment variable 20 Add HOPSWORKS_API_KEY as a Environment variable secret Create an account on Modal (might need some time to be approved)
  • 21.
    Register and Createa Hugging Face Space 2 1. Create an account on Hugging Face 2. Create a “Space” 3. Create a Gradio App with the name Iris inside your account
  • 22.
    Add a HOPSWORKS_API_KEYas a secret in your “iris” Space 22 1. Add your HOPSWORKS_API_KEY as a Repo Secret
  • 23.
  • 24.
    Iris Flower Dataset TabularData Features ● sepal length ● sepal width ● petal length ● petal width Target (label) ● variety This column is the Pandas Index Prediction Problem: Predict the variety, given the length and width of the petal and sepal.
  • 25.
    [Image from https://scikit-learn.org/stable/modules/neighbors.html#classification] Aswe can see here two features (sepal_length and sepal_width) is not enough features to separate the three different varieties (setosa, versicolor, virginica). Classify Iris Flowers with K-Nearest Neighbors
  • 26.
    Communicate the valueof your model with a UI (Gradio) ● Communicate the value of your model to stakeholders with an app/service that uses the ML model to make value-added decisions ● Here, we design a UI in Python with Gradio ○ Enables “predictive analytics” where a user can use the model to as “what-if” i had an Iris Flower with this sepal/petal width/length?
  • 27.
    Run the Feature,Training, Online/Batch Inference Pipelines iris-feature-pipeline.py app.hopsworks.ai iris/app.py iris-training-pipeline.py iris_model features, labels iris_model features, labels iris-batch-inference-pipeline.py features predictions iris-monitoring/app.py logs
  • 28.
    Serverless Surf HeightPrediction: Github Actions/Pages surf-report-features.ipynb swell-features.ipynb batch-predict-surf.ipynb Github Pages Feature Store Lahinch, NOAA Model Registry download model latest_lahinch.png insert DataFrames https://github.com/jimdowling/cjsurf train-model.ipynb add model SERVERLESS COMPUTE SERVERLESS STATE SERVERLESS UI
  • 29.
    SERVERLESS ML Show lovewith a star! www.serverless-ml.org https://github.com/featurestoreorg/serverless-ml-course ⭐