SlideShare a Scribd company logo
1 of 33
Download to read offline
“Deployment for free”:
removing the need to write model
deployment code at Stitch Fix
mlconf April 2021
Stefan Krawczyk
#mlconf #MLOps #machinelearning
@stefkrawczyk
linkedin.com/in/skrawczyk
Try out Stitch Fix → goo.gl/Q3tCQ3
> Stitch Fix
“Deployment for free”
Model Envelope & envelope mechanics
Impact of being on-call
Summary & Future Work
#mlconf #MLOps #machinelearning
Stitch Fix is a personal styling service
Key points:
1. Very algorithmically driven company
2. Single DS Department: Algorithms (145+)
3. “Full Stack Data Science”
a. No reimplementation handoff
b. End to end ownership
c. Built on top of data platform tools & abstractions.
For more information: https://algorithms-tour.stitchfix.com/ & https://cultivating-algos.stitchfix.com/
3
#mlconf #MLOps #machinelearning
Where do I fit in?
Stefan Krawczyk
Mgr. Data Platform - Model Lifecycle
4
Pre-covid look
#mlconf #MLOps #machinelearning
Stitch Fix
> “Deployment for free”
Model Envelope & envelope mechanics
Impact of being on-call
Summary & Future Work
#mlconf #MLOps #machinelearning
Typical Model Deployment Process
6
#mlconf #MLOps #machinelearning
● Many ways to approach.
● Heavily impacts MLOps.
Model Deployment at Stitch Fix
7
#mlconf #MLOps #machinelearning
Once a model is in
an envelope...
This comes for free!
Who owns what?
8
#mlconf #MLOps #machinelearning
DS Concerns Platform Concerns DS Concerns
Deployments are “triggered”
9
#mlconf #MLOps #machinelearning
DS Concerns Platform Concerns DS Concerns
Guess who is on-call?
Reality: two steps to get a model to production
10
#mlconf #MLOps #machinelearning
Self-service: takes <1 hour
No code is written!
Can be a terminal point.
Step 1 Step 2
Step 1. save a model via Model Envelope API
11
#mlconf #MLOps #machinelearning
etl.py
import model_envelope as me
from sklearn import linear_model
df_X, df_y = load_data_somehow()
model = linear_model.LogisticRegression(multi_class='auto')
model.fit(df_X, df_y)
my_envelope = me.save_model(instance_name='my_model_instance_name',
instance_description='my_model_instance_description',
model=model,
query_function='predict',
api_input=df_X, api_output=df_y,
tags={'canonical_name':'foo-bar'})
Note: no deployment trigger in ETL code.
Step 2a. deploy model as a microservice
12
#mlconf #MLOps #machinelearning
Go to Model Envelope Registry UI:
1) Create deployment configuration.
2) Create Rule for auto deployment.
a) Else query for model & hit deploy.
3) Done.
Result:
● Web service with API endpoints
○ Comes with a Swagger UI & schema →
● Model in production < 1 hour.
Step 2b. deploy model as a batch task
13
#mlconf #MLOps #machinelearning
Create workflow configuration:
1) Create batch inference task in workflow.
a) Specify Rule & inputs + outputs.
2) Deploy workflow.
3) Done.
Result:
● Spark or Python task that creates a table.
● We keep an inference log.
● Model in production < 1 hour.
Stitch Fix
“Deployment for free”
> Model Envelope & envelope mechanics
Impact of being on-call
Summary & Future Work
#mlconf #MLOps #machinelearning
Q: What is the Model Envelope? A: It’s a container.
15
Enables treating the
model as a “black box”
#mlconf #MLOps #machinelearning
Enables thinking about models as a “black box”.
🤔 Wait this feels familiar?
16
You: “MLFlow much?”
Me: Yes & No.
This is all internal code -- nothing from open source.
In terms of functionality we’re closer to a mix of:
● MLFlow
● ModelDB
● TFX
But this talk is too short to cover everything...
#mlconf #MLOps #machinelearning
Typical Model Envelope use
1. call save_model() right after model creation in an ETL.
2. also have APIs to save metrics & hyperparameters, and retrieve envelopes.
3. once in an ✉ information is immutable except:
a. tags -- for curative purposes.
b. metrics -- can add/adjust metrics.
#mlconf #MLOps #machinelearning 17
What does save_model() do?
#mlconf #MLOps #machinelearning 18
1
2
3
4
What does save_model() do?
Let’s dive deeper into these.
#mlconf #MLOps #machinelearning 19
1
2
3
4
How do we infer a Model API Schema?
Goal: infer from code rather than explicit specification.
Require either fully annotated functions with only python/typing standard types:
#mlconf #MLOps #machinelearning 20
def good_predict_function(self, x: float, y: List[int]) -> List[float]:
def predict_needs_examples_function(self, x: pd.Dataframe, y):
my_envelope = me.save_model(instance_name='my_model_instance_name',
instance_description='my_model_instance_description',
model=model,
query_function='predict',
api_input=df_X, api_output=df_y,
tags={'canonical_name':'foo-bar'})
Or, example inputs that are inspected to get a schema from:
required for DF inputs →
Model API Schema - Under the hood
● One of the most complex parts of the code base (90%+ test coverage!)
● We make heavy use of the typing_inspect module & isinstance().
○ We create a schema similar to TFX.
● Key component to enable exercising models in different contexts.
○ Enables code creation and input/output validation.
● Current limitations: no default values, one function per envelope.
21
#mlconf #MLOps #machinelearning
How do we capture python dependencies?
import model_envelope as me
from sklearn import linear_model
df_X, df_y = load_data_somehow()
model = linear_model.LogisticRegression(multi_class='auto')
model.fit(df_X, df_y)
my_envelope = me.save_model(instance_name='my_model_instance_name',
instance_description='my_model_instance_description',
model=model,
query_function='predict',
api_input=df_X, api_output=df_y,
tags={'canonical_name':'foo-bar'})
22
#mlconf #MLOps #machinelearning
Point: no explicit passing of scikit-learn to save_model().
How do we capture python dependencies?
23
#mlconf #MLOps #machinelearning
Assumption:
We all run on the same* base linux environment in training & production.
Store the following in the Model Envelope:
● Result of import sys; sys.version_info
● Results of > pip freeze
● Results of > conda list --export
Local python modules (not installable):
● Add modules as part of save_model() call.
● We store them with the model bytes.
How do we build the python deployment env.?
24
#mlconf #MLOps #machinelearning
Filter:
● hard coded list of dependencies to filter. E.g. jupyterhub.
● upkeep cheap; add/update every few months.
Stitch Fix
“Deployment for free”
Model Envelope & envelope mechanics
> Impact of being on-call
Summary & Future Work
#mlconf #MLOps #machinelearning
Remember this split:
26
#mlconf #MLOps #machinelearning
DS Concerns Platform Concerns DS Concerns
My team is on-call for
Impact of being on-call
27
#mlconf #MLOps #machinelearning
Two truths:
● No one wants to be paged.
● No one wants to be paged for a model they didn’t write!
But, this incentivizes Platform to build out MLOps capabilities:
● Capture bad models before they’re deployed!
● Enable observability, monitoring, and alerting to speed up debugging.
Luckily we have autonomy and freedom to do so!
a pager
What can we change?
28
#mlconf #MLOps #machinelearning
API
Automatic capture == license to change:
● Model API schema
● Dependency capture
● Environment info: git, job, etc.
Incentives for DS to additionally provide:
● Datasets for analysis
● Metrics
● Tags
Deployment
MLOps approaches to:
● Model validation
● Model deployment & rollback
● Model deployment vehicle:
○ From logging, monitoring, alerting
○ To architecture: microservice, or Ray, or?
● Dashboarding/UIs
Overarching benefit
29
#mlconf #MLOps #machinelearning
1. Data Scientists get to focus more on modeling.
a. more business wins.
2. Platform focuses on MLOps:
a. can be a rising tide that raises all boats!
Stitch Fix
“Deployment for free”
Model Envelope & envelope mechanics
Impact of being on-call
> Summary & Future Work
#mlconf #MLOps #machinelearning
We enable deployment for free by:
● Capturing a comprehensive model artifact we call the Model Envelope.
● The Model Envelope facilitates code & environment generation for model deployment.
● Platform owns the Model Envelope and is on-call for generated services & tasks.
Business wins:
● Data Scientists get to focus more on modeling.
● Platform is incentivized to improve and iterate on MLOps practices.
Summary - “Deployment for free”
#mlconf #MLOps #machinelearning 31
Future Work
● Better MLOps features:
○ Observability, scalable data capture, & alerting.
○ Model Validation & CD patterns.
● “Models on Rails”:
○ Target specific SLA requirements.
● Configuration driven model creation:
○ Abstract away glue code required to train & save models.
#mlconf #MLOps #machinelearning 32
Thank you! We’re hiring! Questions?
Try out Stitch Fix → goo.gl/Q3tCQ3
@stefkrawczyk
linkedin.com/in/skrawczyk
#mlconf #MLOps #machinelearning

More Related Content

What's hot

Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupTill Rohrmann
 
Denys Kovalenko "Scaling Data Science at Bolt"
Denys Kovalenko "Scaling Data Science at Bolt"Denys Kovalenko "Scaling Data Science at Bolt"
Denys Kovalenko "Scaling Data Science at Bolt"Fwdays
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Vasia Kalavri
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data qualityLars Albertsson
 
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...Databricks
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink StreamingTuri, Inc.
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
 
Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014Taro L. Saito
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemFlink Forward
 
GraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-DevelopmentGraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-Developmentjexp
 
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j
 
Full Stack Graph in the Cloud
Full Stack Graph in the CloudFull Stack Graph in the Cloud
Full Stack Graph in the CloudNeo4j
 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinUnifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinDatabricks
 
Signals from outer space
Signals from outer spaceSignals from outer space
Signals from outer spaceGraphAware
 
Extending the Yahoo Streaming Benchmark
Extending the Yahoo Streaming BenchmarkExtending the Yahoo Streaming Benchmark
Extending the Yahoo Streaming BenchmarkJamie Grier
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinTill Rohrmann
 
Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?Flink Forward
 

What's hot (20)

Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
 
Denys Kovalenko "Scaling Data Science at Bolt"
Denys Kovalenko "Scaling Data Science at Bolt"Denys Kovalenko "Scaling Data Science at Bolt"
Denys Kovalenko "Scaling Data Science at Bolt"
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink Streaming
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
 
Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
 
GraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-DevelopmentGraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-Development
 
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
 
Full Stack Graph in the Cloud
Full Stack Graph in the CloudFull Stack Graph in the Cloud
Full Stack Graph in the Cloud
 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinUnifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
 
Signals from outer space
Signals from outer spaceSignals from outer space
Signals from outer space
 
Extending the Yahoo Streaming Benchmark
Extending the Yahoo Streaming BenchmarkExtending the Yahoo Streaming Benchmark
Extending the Yahoo Streaming Benchmark
 
Crafting APIs
Crafting APIsCrafting APIs
Crafting APIs
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
 
Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?
 

Similar to "Deployment for free": removing the need to write model deployment code at Stitch Fix

Pitfalls of machine learning in production
Pitfalls of machine learning in productionPitfalls of machine learning in production
Pitfalls of machine learning in productionAntoine Sauray
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle Databricks
 
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...Embarcados
 
Creating a custom Machine Learning Model for your applications - Java Dev Day...
Creating a custom Machine Learning Model for your applications - Java Dev Day...Creating a custom Machine Learning Model for your applications - Java Dev Day...
Creating a custom Machine Learning Model for your applications - Java Dev Day...Isabel Palomar
 
Why is dev ops for machine learning so different
Why is dev ops for machine learning so differentWhy is dev ops for machine learning so different
Why is dev ops for machine learning so differentRyan Dawson
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsDataPhoenix
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Creating a Custom ML Model for your Application - Kotlin/Everywhere
Creating a Custom ML Model for your Application - Kotlin/EverywhereCreating a Custom ML Model for your Application - Kotlin/Everywhere
Creating a Custom ML Model for your Application - Kotlin/EverywhereIsabel Palomar
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOpsCarl W. Handlin
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...All Things Open
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)dtz001
 
"Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow""Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow"Databricks
 
Productionalizing Models through CI/CD Design with MLflow
Productionalizing Models through CI/CD Design with MLflowProductionalizing Models through CI/CD Design with MLflow
Productionalizing Models through CI/CD Design with MLflowDatabricks
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureFei Chen
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsDatabricks
 
Why is dev ops for machine learning so different - dataxdays
Why is dev ops for machine learning so different  - dataxdaysWhy is dev ops for machine learning so different  - dataxdays
Why is dev ops for machine learning so different - dataxdaysRyan Dawson
 
Notes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at ScaleNotes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at ScaleDeep Kayal
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOpsRui Quintino
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with DatabricksLiangjun Jiang
 

Similar to "Deployment for free": removing the need to write model deployment code at Stitch Fix (20)

Pitfalls of machine learning in production
Pitfalls of machine learning in productionPitfalls of machine learning in production
Pitfalls of machine learning in production
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
 
Building APIs with Mule and Spring Boot
Building APIs with Mule and Spring BootBuilding APIs with Mule and Spring Boot
Building APIs with Mule and Spring Boot
 
Creating a custom Machine Learning Model for your applications - Java Dev Day...
Creating a custom Machine Learning Model for your applications - Java Dev Day...Creating a custom Machine Learning Model for your applications - Java Dev Day...
Creating a custom Machine Learning Model for your applications - Java Dev Day...
 
Why is dev ops for machine learning so different
Why is dev ops for machine learning so differentWhy is dev ops for machine learning so different
Why is dev ops for machine learning so different
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Creating a Custom ML Model for your Application - Kotlin/Everywhere
Creating a Custom ML Model for your Application - Kotlin/EverywhereCreating a Custom ML Model for your Application - Kotlin/Everywhere
Creating a Custom ML Model for your Application - Kotlin/Everywhere
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
"Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow""Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow"
 
Productionalizing Models through CI/CD Design with MLflow
Productionalizing Models through CI/CD Design with MLflowProductionalizing Models through CI/CD Design with MLflow
Productionalizing Models through CI/CD Design with MLflow
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
 
Why is dev ops for machine learning so different - dataxdays
Why is dev ops for machine learning so different  - dataxdaysWhy is dev ops for machine learning so different  - dataxdays
Why is dev ops for machine learning so different - dataxdays
 
Notes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at ScaleNotes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at Scale
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
 

Recently uploaded

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 

Recently uploaded (20)

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 

"Deployment for free": removing the need to write model deployment code at Stitch Fix

  • 1. “Deployment for free”: removing the need to write model deployment code at Stitch Fix mlconf April 2021 Stefan Krawczyk #mlconf #MLOps #machinelearning @stefkrawczyk linkedin.com/in/skrawczyk Try out Stitch Fix → goo.gl/Q3tCQ3
  • 2. > Stitch Fix “Deployment for free” Model Envelope & envelope mechanics Impact of being on-call Summary & Future Work #mlconf #MLOps #machinelearning
  • 3. Stitch Fix is a personal styling service Key points: 1. Very algorithmically driven company 2. Single DS Department: Algorithms (145+) 3. “Full Stack Data Science” a. No reimplementation handoff b. End to end ownership c. Built on top of data platform tools & abstractions. For more information: https://algorithms-tour.stitchfix.com/ & https://cultivating-algos.stitchfix.com/ 3 #mlconf #MLOps #machinelearning
  • 4. Where do I fit in? Stefan Krawczyk Mgr. Data Platform - Model Lifecycle 4 Pre-covid look #mlconf #MLOps #machinelearning
  • 5. Stitch Fix > “Deployment for free” Model Envelope & envelope mechanics Impact of being on-call Summary & Future Work #mlconf #MLOps #machinelearning
  • 6. Typical Model Deployment Process 6 #mlconf #MLOps #machinelearning ● Many ways to approach. ● Heavily impacts MLOps.
  • 7. Model Deployment at Stitch Fix 7 #mlconf #MLOps #machinelearning Once a model is in an envelope... This comes for free!
  • 8. Who owns what? 8 #mlconf #MLOps #machinelearning DS Concerns Platform Concerns DS Concerns
  • 9. Deployments are “triggered” 9 #mlconf #MLOps #machinelearning DS Concerns Platform Concerns DS Concerns Guess who is on-call?
  • 10. Reality: two steps to get a model to production 10 #mlconf #MLOps #machinelearning Self-service: takes <1 hour No code is written! Can be a terminal point. Step 1 Step 2
  • 11. Step 1. save a model via Model Envelope API 11 #mlconf #MLOps #machinelearning etl.py import model_envelope as me from sklearn import linear_model df_X, df_y = load_data_somehow() model = linear_model.LogisticRegression(multi_class='auto') model.fit(df_X, df_y) my_envelope = me.save_model(instance_name='my_model_instance_name', instance_description='my_model_instance_description', model=model, query_function='predict', api_input=df_X, api_output=df_y, tags={'canonical_name':'foo-bar'}) Note: no deployment trigger in ETL code.
  • 12. Step 2a. deploy model as a microservice 12 #mlconf #MLOps #machinelearning Go to Model Envelope Registry UI: 1) Create deployment configuration. 2) Create Rule for auto deployment. a) Else query for model & hit deploy. 3) Done. Result: ● Web service with API endpoints ○ Comes with a Swagger UI & schema → ● Model in production < 1 hour.
  • 13. Step 2b. deploy model as a batch task 13 #mlconf #MLOps #machinelearning Create workflow configuration: 1) Create batch inference task in workflow. a) Specify Rule & inputs + outputs. 2) Deploy workflow. 3) Done. Result: ● Spark or Python task that creates a table. ● We keep an inference log. ● Model in production < 1 hour.
  • 14. Stitch Fix “Deployment for free” > Model Envelope & envelope mechanics Impact of being on-call Summary & Future Work #mlconf #MLOps #machinelearning
  • 15. Q: What is the Model Envelope? A: It’s a container. 15 Enables treating the model as a “black box” #mlconf #MLOps #machinelearning Enables thinking about models as a “black box”.
  • 16. 🤔 Wait this feels familiar? 16 You: “MLFlow much?” Me: Yes & No. This is all internal code -- nothing from open source. In terms of functionality we’re closer to a mix of: ● MLFlow ● ModelDB ● TFX But this talk is too short to cover everything... #mlconf #MLOps #machinelearning
  • 17. Typical Model Envelope use 1. call save_model() right after model creation in an ETL. 2. also have APIs to save metrics & hyperparameters, and retrieve envelopes. 3. once in an ✉ information is immutable except: a. tags -- for curative purposes. b. metrics -- can add/adjust metrics. #mlconf #MLOps #machinelearning 17
  • 18. What does save_model() do? #mlconf #MLOps #machinelearning 18 1 2 3 4
  • 19. What does save_model() do? Let’s dive deeper into these. #mlconf #MLOps #machinelearning 19 1 2 3 4
  • 20. How do we infer a Model API Schema? Goal: infer from code rather than explicit specification. Require either fully annotated functions with only python/typing standard types: #mlconf #MLOps #machinelearning 20 def good_predict_function(self, x: float, y: List[int]) -> List[float]: def predict_needs_examples_function(self, x: pd.Dataframe, y): my_envelope = me.save_model(instance_name='my_model_instance_name', instance_description='my_model_instance_description', model=model, query_function='predict', api_input=df_X, api_output=df_y, tags={'canonical_name':'foo-bar'}) Or, example inputs that are inspected to get a schema from: required for DF inputs →
  • 21. Model API Schema - Under the hood ● One of the most complex parts of the code base (90%+ test coverage!) ● We make heavy use of the typing_inspect module & isinstance(). ○ We create a schema similar to TFX. ● Key component to enable exercising models in different contexts. ○ Enables code creation and input/output validation. ● Current limitations: no default values, one function per envelope. 21 #mlconf #MLOps #machinelearning
  • 22. How do we capture python dependencies? import model_envelope as me from sklearn import linear_model df_X, df_y = load_data_somehow() model = linear_model.LogisticRegression(multi_class='auto') model.fit(df_X, df_y) my_envelope = me.save_model(instance_name='my_model_instance_name', instance_description='my_model_instance_description', model=model, query_function='predict', api_input=df_X, api_output=df_y, tags={'canonical_name':'foo-bar'}) 22 #mlconf #MLOps #machinelearning Point: no explicit passing of scikit-learn to save_model().
  • 23. How do we capture python dependencies? 23 #mlconf #MLOps #machinelearning Assumption: We all run on the same* base linux environment in training & production. Store the following in the Model Envelope: ● Result of import sys; sys.version_info ● Results of > pip freeze ● Results of > conda list --export Local python modules (not installable): ● Add modules as part of save_model() call. ● We store them with the model bytes.
  • 24. How do we build the python deployment env.? 24 #mlconf #MLOps #machinelearning Filter: ● hard coded list of dependencies to filter. E.g. jupyterhub. ● upkeep cheap; add/update every few months.
  • 25. Stitch Fix “Deployment for free” Model Envelope & envelope mechanics > Impact of being on-call Summary & Future Work #mlconf #MLOps #machinelearning
  • 26. Remember this split: 26 #mlconf #MLOps #machinelearning DS Concerns Platform Concerns DS Concerns My team is on-call for
  • 27. Impact of being on-call 27 #mlconf #MLOps #machinelearning Two truths: ● No one wants to be paged. ● No one wants to be paged for a model they didn’t write! But, this incentivizes Platform to build out MLOps capabilities: ● Capture bad models before they’re deployed! ● Enable observability, monitoring, and alerting to speed up debugging. Luckily we have autonomy and freedom to do so! a pager
  • 28. What can we change? 28 #mlconf #MLOps #machinelearning API Automatic capture == license to change: ● Model API schema ● Dependency capture ● Environment info: git, job, etc. Incentives for DS to additionally provide: ● Datasets for analysis ● Metrics ● Tags Deployment MLOps approaches to: ● Model validation ● Model deployment & rollback ● Model deployment vehicle: ○ From logging, monitoring, alerting ○ To architecture: microservice, or Ray, or? ● Dashboarding/UIs
  • 29. Overarching benefit 29 #mlconf #MLOps #machinelearning 1. Data Scientists get to focus more on modeling. a. more business wins. 2. Platform focuses on MLOps: a. can be a rising tide that raises all boats!
  • 30. Stitch Fix “Deployment for free” Model Envelope & envelope mechanics Impact of being on-call > Summary & Future Work #mlconf #MLOps #machinelearning
  • 31. We enable deployment for free by: ● Capturing a comprehensive model artifact we call the Model Envelope. ● The Model Envelope facilitates code & environment generation for model deployment. ● Platform owns the Model Envelope and is on-call for generated services & tasks. Business wins: ● Data Scientists get to focus more on modeling. ● Platform is incentivized to improve and iterate on MLOps practices. Summary - “Deployment for free” #mlconf #MLOps #machinelearning 31
  • 32. Future Work ● Better MLOps features: ○ Observability, scalable data capture, & alerting. ○ Model Validation & CD patterns. ● “Models on Rails”: ○ Target specific SLA requirements. ● Configuration driven model creation: ○ Abstract away glue code required to train & save models. #mlconf #MLOps #machinelearning 32
  • 33. Thank you! We’re hiring! Questions? Try out Stitch Fix → goo.gl/Q3tCQ3 @stefkrawczyk linkedin.com/in/skrawczyk #mlconf #MLOps #machinelearning