A journey through the machine learning model deployment to production

A Journey through the ML Model
Deployment to Production

STANKO KUVELJIC

A Journey through the
ML Model Deployment
to Production

A Journey through the
ML Model Deployment
to Production
HELL

“Expectations were like fine pottery. The harder you held them, the more
likely they were to crack.”, The Way of the Kings, Brandon Sanderson
1. ML Deployment
2. Deploy on the edge
3. Data base deployment
4. REST API - Flask
5. REST API - Flask + uWSGI
6. TensorFlow Serving
7. Message Queues
8. Tidal Waves
9. Summary

People with no idea about AI
saying it will take over the world
My network

“I know the pieces fit 'cause I watched them tumble down”, Schism, Tool
Light
heaven
abyss
Shadow
OrderLife
DeathChaos
Reality
astral
AI system astral rift
& ML
VOID

source: https://imgur.com/t/elon/5aTN9vc

Reality
astral
AI system
ML
VOID
Feature
Engineering
Data
Verification
Data
Storage
Scheduler
Data
Collection
Automation
Monitoring Configuration
BIServing
Voodoo
Task
Managers

Some of Serving Approaches
1. On the edge - (device deployment)

2. Database batch inference

3. REST API

3. REST API
4. Streaming

Mobile Device Deployment
Calls model on the device
Predictions: On the Fly
Latency: Low
Constraint: Model complexity

Mobile Device Deployment
Calls model on the device
Predictions: On the Fly
Latency: Low
Constraint: Model complexity
Animal: Cat

Batch Inference
App
Preproc Inference
Raw
Data
Scored
Data
Scheduler/Cli

Batch Inference
Predictions: On demand/scheduled
Latency: “less important”
Constraint: not real time
App
Preproc Inference
Raw
Data
Scored
Data
Scheduler/Cli

Flask
Client
Flask GPUML
Models
Other
Services DB

Flask
• Web framework - NOT A SERVER
• Easy for development
• Single request per time
• Can’t scale
• Not for production
Client
Flask GPUML
Models
Other
Services DB

uWSGI
M
W
W
W
• Web Server Gateway Interface
• Mostly used for Python application

uWSGI - Forking (copy on write)
Master
Worker
Loads
App Worker
Worker
Copy from
Parent
Copy from
Parent
Copy from
Parent

uWSGI - Lazy Apps
Master
Worker
Doesn’t
Load App
Worker
Worker
Loads
App
Loads
App
Loads
App

uWSGI - Postfork fix
Master
Loads
App
Worker
Copy from
Parent
Worker
Copy from
Parent
Worker
Copy from
Parent
postfork()
postfork()
postfork()

uWSGI - Lazy and Postforked Summary
• TF requires postfork (or lazy apps)
• Each process makes copy of ML model
• Each process maintains own session
• High memory footprint
• GPU doesn’t always help

App 1
App 2
App 3
GPU
TensorFlow
Serving
Models

App 1
App 2
App 3
GPU
TensorFlow
Serving
Models
• Separated from the APPS
• Multiple models and versions
• Manages session with GPU
• Allows batch processing
• Scalable
• REST/GRPC endpoints
• Versioning policy

Client
Flask + uWSGI
DB
Client
TF - Client
Other
Services
GPU
TensorFlow
Serving
Models

Message queues
Client app
Producer
Client app
Producer
Consumer Producer
Request
Queue Response
Queue
Client app
ConsumerML MODELS
&
INFERENCE

LEARN TO SWIM
Lyrics: Ænima, Tool

'CAUSE I'M PRAYING FOR RAIN
AND I'M PRAYING FOR TIDAL WAVES

I WANNA SEE IT COME DOWN

BURN IT DOWN

FLASH IT DOWN

WITH LOAD TESTS

Tested Models
MODEL PARAMETERS NOTES
Inception 4M InceptionV4
CNN (image) 0.5M 6 - CONV LAYERS
LSTM (text) 1M 2 x LSTM (128) - 256 unroll
Machine
OS Ubuntu16
RAM 32GB
GPU Nvidia1050Ti
CPU i7

CNN - Results
Workers GPU/CPU Throughtput (e/s) Load RPS
Response
TIme (s)
Breakpoint
RPS
1 (Flask) CPU 104 200 18.1 200
2 CPU 200 200 1.08 500
4 CPU 200 200 0.07 600
1 (Flask) GPU 650 750 11.1 750
2 GPU 750 750 1.0 900
4 GPU 750 750 0.04 1000

Inception - Results
Response
TIme (s)
Breakpoint
RPS
1 (Flask) CPU 4 10 32 10
2 CPU 5 10 18.3 10
4 CPU 3 10 26.5 10
1 (Flask) GPU 20 50 30 50
2 GPU 20 50 21 50

RNN - Results
Response
TIme (s)
Breakpoint
RPS
1 (Flask) CPU 35 100 24.6 100
2 CPU 50 100 19.7 100
4 CPU 60 100 18.0 100
1 (Flask) GPU 10 50 36.59 50
2 GPU 10 50 27.7 50
4 GPU 15 50 26.5 50

“The purpose of a storyteller is not to tell you how to think, but to give you
questions to think upon.”, The Way of the Kings, Brandon Sanderson
• Business?
• Does accuracy matter?
• Real time?
• Retraining?
• Data size?
• Algorithm to use?
• Demo vs Production?

A journey through the machine learning model deployment to production

A journey through the machine learning model deployment to production

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A journey through the machine learning model deployment to production

Similar to A journey through the machine learning model deployment to production (20)

More from Institute of Contemporary Sciences

More from Institute of Contemporary Sciences (20)

Recently uploaded

Recently uploaded (20)

A journey through the machine learning model deployment to production