SlideShare a Scribd company logo
Full Stack Deep Learning
Josh Tobin, Sergey Karayev, Pieter Abbeel
Testing & Deployment
Full Stack Deep Learning
“All-in-one”
Deployment
or or
Development Training/Evaluation
or
Data
or
Experiment Management
Labeling
Interchange
Storage
Monitoring
?
Hardware /
Mobile
Software Engineering
Frameworks
Database
Hyperparameter TuningDistributed Training
Web
Versioning
Processing
CI / Testing
?
Resource Management
Testing & Deployment - Overview
Full Stack Deep Learning
Module Overview
• Concepts

• Project structure

• ML Test Score: a rubric for production readiness

• Infrastructure/Tooling

• CI/Testing

• Docker in more depth

• Deploying to the web

• Monitoring

• Deploying to hardware/mobile
3
Full Stack Deep Learning
4
Training System
- Process raw data

- Run experiments

- Manage results
Prediction System
- Process input data

- Construct networks with
trained weights

- Make predictions
Serving System

- Serve predictions

- Scale to demand
Training System
Tests

- Test full training pipeline

- Start from raw data

- Runs in < 1 day (nightly)

- Catches upstream
regressions
Validation Tests

- Test prediction system on
validation set

- Start from processed data

- Runs in < 1 hour (every
code push)

- Catches model
regressions
Functionality Tests

- Test prediction system on
a few important examples

- Runs in <5 minutes
(during development)

- Catches code regressions
Monitoring

- Alert to downtime,
errors, distribution
shifts

- Catches service and
data regressions
Training/Validation Data Production Data
Testing & Deployment - Project Structure
Full Stack Deep Learning
5
The ML Test Score:
A Rubric for ML Production Readiness and Technical Debt Reduction
Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, D. Sculley
Google, Inc.
ebreck, cais, nielsene, msalib, dsculley@google.com
bstract—Creating reliable, production-level machine learn-
systems brings on a host of concerns not found in
ll toy examples or even large offline research experiments.
ing and monitoring are key considerations for ensuring
production-readiness of an ML system, and for reducing
nical debt of ML systems. But it can be difficult to formu-
specific tests, given that the actual prediction behavior of
given model is difficult to specify a priori. In this paper,
present 28 specific tests and monitoring needs, drawn from
may find difficult. Note that this rubric focuses on is
specific to ML systems, and so does not include gen
software engineering best practices such as ensuring g
unit test coverage and a well-defined binary release proc
Such strategies remain necessary as well. We do call
a few specific areas for unit or integration tests that h
unique ML-related behavior.
How to read the tests: Each test is written as
IEEE Big Data 2017
An exhaustive framework/checklist from practitioners at Google
Testing & Deployment - ML Test Score
Full Stack Deep Learning
6
Follow-up to previous work

from Google
Testing & Deployment - ML Test Score
Full Stack Deep Learning
7
ML Systems Require Extensive Testing and Monitoring. The key consideration is that unlike a manually coded system (left),
avior is not easily specified in advance. This behavior depends on dynamic qualities of the data, and on various model configuration
d with determining how reliable an ML system is
an how to build one.
of surprising sources of technical debt in ML
bias. Visualization tools such as Facets1
can be ver
for analyzing the data to produce the schema. Invar
capture in a schema can also be inferred automatica
Traditional Software
ML Systems Require Extensive Testing and Monitoring. The key consideration is that unlike a manually coded system
ehavior is not easily specified in advance. This behavior depends on dynamic qualities of the data, and on various model config
ed with determining how reliable an ML system is
han how to build one.
s of surprising sources of technical debt in ML
bias. Visualization tools such as Facets1
can b
for analyzing the data to produce the schema
capture in a schema can also be inferred autom
Machine Learning Software
Training
System
Prediction
System
Serving
System

Data
Testing & Deployment - ML Test Score
Full Stack Deep Learning
8and then compare them to the data to avoid an anchoring
1 Feature expectations are captured in a schema.
2 All features are beneficial.
3 No feature’s cost is too much.
4 Features adhere to meta-level requirements.
5 The data pipeline has appropriate privacy controls.
6 New features can be added quickly.
7 All input feature code is tested.
Table I
BRIEF LISTING OF THE SEVEN DATA TESTS.
out a wide variety of potential features to improv
quality.
How? Programmatically enforce these requ
that all models in production properly adhere t
Data 5: The data pipeline has appropr
controls: Training data, validation data, and voc
all have the potential to contain sensitive user
teams often are aware of the need to remove per
tifiable information (PII), during this type of e
1https://pair-code.github.io/facets/
Data Tests
Table III
BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS
1 Training is reproducible.
2 Model specs are unit tested.
3 The ML pipeline is Integration tested.
4 Model quality is validated before serving.
5 The model is debuggable.
6 Models are canaried before serving.
7 Serving models can be rolled back.
country, users by frequency of use, or movies by genre.
Infra 1: Training is reproducible: Idea
twice on the same data should produce two ide
els. Deterministic training dramatically simplifie
about the whole system and can aid auditability
ging. For example, optimizing feature generatio
delicate process but verifying that the old and
generation code will train to an identical model
more confidence that the refactoring was correc
of diff-testing relies entirely on deterministic tra
Unfortunately, model training is often not rep
practice, especially when working with non-conv
ML Infrastructure Tests
the model specification can help make training auditabl
improve reproducibility.
1 Model specs are reviewed and submitted.
2 Offline and online metrics correlate.
3 All hyperparameters have been tuned.
4 The impact of model staleness is known.
5 A simpler model is not better.
6 Model quality is sufficient on important data slices.
7 The model is tested for considerations of inclusion.
Table II
BRIEF LISTING OF THE SEVEN MODEL TESTS
Model Tests
Table IV
BRIEF LISTING OF THE SEVEN MONITORING TESTS
1 Dependency changes result in notification.
2 Data invariants hold for inputs.
3 Training and serving are not skewed.
4 Models are not too stale.
5 Models are numerically stable.
6 Computing performance has not regressed.
7 Prediction quality has not regressed.
normally, when not in emergency conditions.
Monitoring Tests
Testing & Deployment - ML Test Score
Full Stack Deep Learning
Scoring the Test
Points Description
0 More of a research project than a productionized system.
(0,1] Not totally untested, but it is worth considering the possibility of serious holes in reliability.
(1,2] There’s been first pass at basic productionization, but additional investment may be needed.
(2,3] Reasonably tested, but it’s possible that more of those tests and procedures may be automated.
(3,5] Strong levels of automated testing and monitoring, appropriate for mission-critical systems.
> 5 Exceptional levels of automated testing and monitoring.
Table V
L Test Score. THIS SCORE IS COMPUTED BY TAKING THE minimum SCORE FROM EACH OF THE FOUR TEST A
MS AT DIFFERENT POINTS IN THEIR DEVELOPMENT MAY REASONABLY AIM TO BE AT DIFFERENT POINTS AL
worth the same number of points. This is numbers”). When we asked if they had d9Testing & Deployment - ML Test Score
Full Stack Deep LearningFigure 4. Average scores for interviewed teams. These graphs display the average score for each test across the 36 systems we examined.
• 36 teams with ML
systems in production
at Google
Results at Google
10Testing & Deployment - ML Test Score
Full Stack Deep Learning
Questions?
11
Full Stack Deep Learning
“All-in-one”
Deployment
or or
Development Training/Evaluation
or
Data
or
Experiment Management
Labeling
Interchange
Storage
Monitoring
?
Hardware /
Mobile
Software Engineering
Frameworks
Database
Hyperparameter TuningDistributed Training
Web
Versioning
Processing
CI / Testing
?
Resource Management
Testing & Deployment - CI/Testing
Full Stack Deep Learning
Testing / CI
• Unit / Integration Tests
• Tests for individual module functionality and for the whole system

• Continuous Integration
• Tests are run every time new code is pushed to the repository, before updated model is
deployed.

• SaaS for CI
• CircleCI, Travis, Jenkins, Buildkite

• Containerization (via Docker)
• A self-enclosed environment for running the tests
13Testing & Deployment - CI/Testing
Full Stack Deep Learning
14
Training System
- Process raw data

- Run experiments

- Manage results
Prediction System
- Process input data

- Construct networks with
trained weights

- Make predictions
Serving System

- Serve predictions

- Scale to demand
Training System
Tests

- Test full training pipeline

- Start from raw data

- Runs in < 1 day (nightly)
- Catches upstream
regressions
Validation Tests

- Test prediction system on
validation set

- Start from processed data

- Runs in < 1 hour (every
code push)
- Catches model
regressions
Functionality Tests

- Test prediction system on
a few important examples

- Runs in <5 minutes
(during development)
- Catches code regressions
Monitoring

- Alert to downtime,
errors, distribution
shifts

- Catches service and
data regressions
Training/Validation Data Production Data
Testing & Deployment - CI/Testing
Full Stack Deep Learning
15
and then compare them to the data to avoid an anchoring
1 Feature expectations are captured in a schema.
2 All features are beneficial.
3 No feature’s cost is too much.
4 Features adhere to meta-level requirements.
5 The data pipeline has appropriate privacy controls.
6 New features can be added quickly.
7 All input feature code is tested.
Table I
BRIEF LISTING OF THE SEVEN DATA TESTS.
out a wide variety of potential features to improv
quality.
How? Programmatically enforce these requ
that all models in production properly adhere t
Data 5: The data pipeline has appropr
controls: Training data, validation data, and voc
all have the potential to contain sensitive user
teams often are aware of the need to remove per
tifiable information (PII), during this type of e
1https://pair-code.github.io/facets/
Data Tests
improve reproducibility.
1 Model specs are reviewed and submitted.
2 Offline and online metrics correlate.
3 All hyperparameters have been tuned.
4 The impact of model staleness is known.
5 A simpler model is not better.
6 Model quality is sufficient on important data slices.
7 The model is tested for considerations of inclusion.
Table II
BRIEF LISTING OF THE SEVEN MODEL TESTS
Model Tests
Table III
BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS
1 Training is reproducible.
2 Model specs are unit tested.
3 The ML pipeline is Integration tested.
4 Model quality is validated before serving.
5 The model is debuggable.
6 Models are canaried before serving.
7 Serving models can be rolled back.
country, users by frequency of use, or movies by genre.
Examining sliced data avoids having fine-grained quality
Infra 1: Training is reproducible: Idea
twice on the same data should produce two ide
els. Deterministic training dramatically simplifie
about the whole system and can aid auditability
ging. For example, optimizing feature generatio
delicate process but verifying that the old and
generation code will train to an identical model
more confidence that the refactoring was correc
of diff-testing relies entirely on deterministic tra
Unfortunately, model training is often not rep
practice, especially when working with non-conv
ML Infrastructure Tests
Table IV
BRIEF LISTING OF THE SEVEN MONITORING TESTS
1 Dependency changes result in notification.
2 Data invariants hold for inputs.
3 Training and serving are not skewed.
4 Models are not too stale.
5 Models are numerically stable.
6 Computing performance has not regressed.
7 Prediction quality has not regressed.
normally, when not in emergency conditions.
Monitoring Tests
Testing & Deployment - CI/Testing
Full Stack Deep Learning
CircleCI / Travis / etc
• SaaS for continuous integration

• Integrate with your repository: every push kicks off a cloud job

• Job can be defined as commands in a Docker container, and can store
results for later review

• No GPUs

• CircleCI has a free plan
16Testing & Deployment - CI/Testing
Full Stack Deep Learning
Jenkins / Buildkite
• Nice option for running CI on your own hardware, in the cloud, or mixed

• Good option for a scheduled training test (can use your GPUs)

• Very flexible
17
Full Stack Deep Learning
Docker
• Before proceeding, we have to understand Docker better.

• Docker vs VM

• Dockerfile and layer-based images

• DockerHub and the ecosystem

• Kubernetes and other orchestrators
18Testing & Deployment - Docker
Full Stack Deep Learning
19
https://medium.freecodecamp.org/a-beginner-friendly-introduction-to-containers-vms-and-docker-79a9e3e119b
No OS -> Light weight
Testing & Deployment - Docker
Full Stack Deep Learning
• Spin up a container for
every discrete task

• For example, a web app
might have four containers:

• Web server

• Database

• Job queue

• Worker
20
https://www.docker.com/what-container
Light weight -> Heavy use
Testing & Deployment - Docker
Full Stack Deep Learning
Dockerfile
21Testing & Deployment - Docker
Full Stack Deep Learning
Strong Ecosystem
• Images are easy to
find, modify, and
contribute back to
DockerHub

• Private images
easy to store in
same place
22
https://docs.docker.com/engine/docker-overview
Testing & Deployment - Docker
Full Stack Deep Learning
23
https://www.docker.com/what-container#/package_software
Testing & Deployment - Docker
Full Stack Deep Learning
Container Orchestration
• Distributing containers onto underlying VMs
or bare metal

• Kubernetes is the open-source winner, but
cloud providers have good offerings, too.
24Testing & Deployment - Docker
Full Stack Deep Learning
Questions?
25
Full Stack Deep Learning
“All-in-one”
Deployment
or or
Development Training/Evaluation
or
Data
or
Experiment Management
Labeling
Interchange
Storage
Monitoring
?
Hardware /
Mobile
Software Engineering
Frameworks
Database
Hyperparameter TuningDistributed Training
Web
Versioning
Processing
CI / Testing
?
Resource Management
Testing & Deployment - Web Deployment
Full Stack Deep Learning
Terminology
• Inference: making predictions (not training)

• Model Serving: ML-specific deployment, usually focused on optimizing
GPU usage
27Testing & Deployment - Web Deployment
Full Stack Deep Learning
Web Deployment
• REST API
• Serving predictions in response to canonically-formatted HTTP requests

• The web server is running and calling the prediction system

• Options
• Deploying code to VMs, scale by adding instances

• Deploy code as containers, scale via orchestration

• Deploy code as a “serverless function”

• Deploy via a model serving solution
28Testing & Deployment - Web Deployment
Full Stack Deep Learning
29
Training System
- Process raw data

- Run experiments

- Manage results
Prediction System
- Process input data

- Construct networks with
trained weights

- Make predictions
Serving System

- Serve predictions

- Scale to demand
Training System
Tests

- Test full training pipeline

- Start from raw data

- Runs in < 1 day (nightly)

- Catches upstream
regressions
Validation Tests

- Test prediction system on
validation set

- Start from processed data

- Runs in < 1 hour (every
code push)

- Catches model
regressions
Functionality Tests

- Test prediction system on
a few important examples

- Runs in <5 minutes
(during development)

- Catches code regressions
Monitoring

- Alert to downtime,
errors, distribution
shifts

- Catches service and
data regressions
Training/Validation Data Production Data
Testing & Deployment - Web Deployment
Full Stack Deep Learning
30
and then compare them to the data to avoid an anchoring
1 Feature expectations are captured in a schema.
2 All features are beneficial.
3 No feature’s cost is too much.
4 Features adhere to meta-level requirements.
5 The data pipeline has appropriate privacy controls.
6 New features can be added quickly.
7 All input feature code is tested.
Table I
BRIEF LISTING OF THE SEVEN DATA TESTS.
out a wide variety of potential features to improv
quality.
How? Programmatically enforce these requ
that all models in production properly adhere t
Data 5: The data pipeline has appropr
controls: Training data, validation data, and voc
all have the potential to contain sensitive user
teams often are aware of the need to remove per
tifiable information (PII), during this type of e
1https://pair-code.github.io/facets/
Data Tests
improve reproducibility.
1 Model specs are reviewed and submitted.
2 Offline and online metrics correlate.
3 All hyperparameters have been tuned.
4 The impact of model staleness is known.
5 A simpler model is not better.
6 Model quality is sufficient on important data slices.
7 The model is tested for considerations of inclusion.
Table II
BRIEF LISTING OF THE SEVEN MODEL TESTS
Model Tests
Table III
BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS
1 Training is reproducible.
2 Model specs are unit tested.
3 The ML pipeline is Integration tested.
4 Model quality is validated before serving.
5 The model is debuggable.
6 Models are canaried before serving.
7 Serving models can be rolled back.
country, users by frequency of use, or movies by genre.
Examining sliced data avoids having fine-grained quality
Infra 1: Training is reproducible: Idea
twice on the same data should produce two ide
els. Deterministic training dramatically simplifie
about the whole system and can aid auditability
ging. For example, optimizing feature generatio
delicate process but verifying that the old and
generation code will train to an identical model
more confidence that the refactoring was correc
of diff-testing relies entirely on deterministic tra
Unfortunately, model training is often not rep
practice, especially when working with non-conv
ML Infrastructure Tests
Table IV
BRIEF LISTING OF THE SEVEN MONITORING TESTS
1 Dependency changes result in notification.
2 Data invariants hold for inputs.
3 Training and serving are not skewed.
4 Models are not too stale.
5 Models are numerically stable.
6 Computing performance has not regressed.
7 Prediction quality has not regressed.
normally, when not in emergency conditions.
Monitoring Tests
Testing & Deployment - Web Deployment
Full Stack Deep Learning
31
DEPLOY CODE TO YOUR OWN BARE METAL
Testing & Deployment - Web Deployment
Full Stack Deep Learning
32
DEPLOY CODE TO YOUR OWN BARE METAL
DEPLOY CODE TO CLOUD INSTANCES
Testing & Deployment - Web Deployment
Full Stack Deep Learning
Deploying code to cloud instances
• Each instance has to be provisioned with dependencies and app code

• Number of instances is managed manually or via auto-scaling

• Load balancer sends user traffic to the instances
33
Full Stack Deep Learning
Deploying code to cloud instances
• Cons:
• Provisioning can be brittle

• Paying for instances even when not using them (auto-scaling does help)
34Testing & Deployment - Web Deployment
Full Stack Deep Learning
35
DEPLOY DOCKER CONTAINERS TO

CLOUD INSTANCES
DEPLOY CODE TO YOUR OWN BARE METAL
DEPLOY CODE TO CLOUD INSTANCES
Testing & Deployment - Web Deployment
Full Stack Deep Learning
Deploying containers
• App code and dependencies are packaged into Docker containers

• Kubernetes or alternative (AWS Fargate) orchestrates containers (DB /
workers / web servers / etc.
36Testing & Deployment - Web Deployment
Full Stack Deep Learning
Deploying containers
• Cons:
• Still managing your own servers and paying for uptime, not compute-
time
37Testing & Deployment - Web Deployment
Full Stack Deep Learning
38
DEPLOY CODE TO YOUR OWN BARE METAL
DEPLOY CODE TO CLOUD INSTANCES
DEPLOY DOCKER CONTAINERS TO

CLOUD INSTANCES
DEPLOY SERVERLESS FUNCTIONS
Testing & Deployment - Web Deployment
Full Stack Deep Learning
Deploying code as serverless functions
• App code and dependencies are packaged into .zip files, with a single
entry point function

• AWS Lambda (or Google Cloud Functions, or Azure Functions) manages
everything else: instant scaling to 10,000+ requests per second, load
balancing, etc.

• Only pay for compute-time.
39Testing & Deployment - Web Deployment
Full Stack Deep Learning
40
Testing & Deployment - Web Deployment
Full Stack Deep Learning
Deploying code as serverless functions
• Cons:
• Entire deployment package has to fit within 500MB, <5 min execution,
<3GB memory (on AWS Lambda)

• Only CPU execution
41Testing & Deployment - Web Deployment
Full Stack Deep Learning
42
and then compare them to the data to avoid an anchoring
1 Feature expectations are captured in a schema.
2 All features are beneficial.
3 No feature’s cost is too much.
4 Features adhere to meta-level requirements.
5 The data pipeline has appropriate privacy controls.
6 New features can be added quickly.
7 All input feature code is tested.
Table I
BRIEF LISTING OF THE SEVEN DATA TESTS.
out a wide variety of potential features to improv
quality.
How? Programmatically enforce these requ
that all models in production properly adhere t
Data 5: The data pipeline has appropr
controls: Training data, validation data, and voc
all have the potential to contain sensitive user
teams often are aware of the need to remove per
tifiable information (PII), during this type of e
1https://pair-code.github.io/facets/
Data Tests
improve reproducibility.
1 Model specs are reviewed and submitted.
2 Offline and online metrics correlate.
3 All hyperparameters have been tuned.
4 The impact of model staleness is known.
5 A simpler model is not better.
6 Model quality is sufficient on important data slices.
7 The model is tested for considerations of inclusion.
Table II
BRIEF LISTING OF THE SEVEN MODEL TESTS
Model Tests
Table III
BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS
1 Training is reproducible.
2 Model specs are unit tested.
3 The ML pipeline is Integration tested.
4 Model quality is validated before serving.
5 The model is debuggable.
6 Models are canaried before serving.
7 Serving models can be rolled back.
country, users by frequency of use, or movies by genre.
Examining sliced data avoids having fine-grained quality
Infra 1: Training is reproducible: Idea
twice on the same data should produce two ide
els. Deterministic training dramatically simplifie
about the whole system and can aid auditability
ging. For example, optimizing feature generatio
delicate process but verifying that the old and
generation code will train to an identical model
more confidence that the refactoring was correc
of diff-testing relies entirely on deterministic tra
Unfortunately, model training is often not rep
practice, especially when working with non-conv
ML Infrastructure Tests
Table IV
BRIEF LISTING OF THE SEVEN MONITORING TESTS
1 Dependency changes result in notification.
2 Data invariants hold for inputs.
3 Training and serving are not skewed.
4 Models are not too stale.
5 Models are numerically stable.
6 Computing performance has not regressed.
7 Prediction quality has not regressed.
normally, when not in emergency conditions.
Monitoring Tests
Testing & Deployment - Web Deployment
Full Stack Deep Learning
Deploying code as serverless functions
• Canarying
• Easy to have two versions of Lambda functions in production, and start
sending low volume traffic to one.

• Rollback
• Easy to switch back to the previous version of the function.
43Testing & Deployment - Web Deployment
Full Stack Deep Learning
Model Serving
• Web deployment options specialized for machine learning models

• Tensorflow Serving (Google)

• Model Server for MXNet (Amazon)

• Clipper (Berkeley RISE Lab)

• SaaS solutions like Algorithmia

• The most important feature is batching requests for GPU inference
44Testing & Deployment - Web Deployment
Full Stack Deep Learning
Tensorflow Serving
• Able to serve Tensorflow predictions at high
throughput.

• Used by Google and their “all-in-one” machine
learning offerings

• Overkill unless you know you need to use GPU for
inference
45
Figure 2: TFS2
(hosted model serving) architecture.
2.2.1 Inter-Request Batching
Full Stack Deep Learning
MXNet Model Server
• Amazon’s answer to Tensorflow Serving

• Part of their SageMaker “all-in-one” machine learning offering.
46
Full Stack Deep Learning
Clipper
• Open-source model serving via
REST, using Docker.

• Nice-to-haves on top of the basics

• e.g “state-of-the-art bandit and
ensemble methods to
intelligently select and combine
predictions”

• Probably too early to be actually
useful (unless you know why you
need it)
47
Full Stack Deep Learning
Algorithmia
• Some startups promise effortless model serving
48
Full Stack Deep Learning
Seldon
49
https://www.seldon.io/tech/products/core/
Testing & Deployment - Web Deployment
Full Stack Deep Learning
Takeaway
• If you are doing CPU inference, can get away with scaling by launching
more servers, or going serverless.

• The dream is deploying Docker as easily as Lambda

• Next best thing is either Lambda or Docker, depending on your model’s
needs.

• If using GPU inference, things like TF Serving and Clipper become useful
by adaptive batching, etc.
50Testing & Deployment - Web Deployment
Full Stack Deep Learning
Questions?
51
Full Stack Deep Learning
“All-in-one”
Deployment
or or
Development Training/Evaluation
or
Data
or
Experiment Management
Labeling
Interchange
Storage
Monitoring
?
Hardware /
Mobile
Software Engineering
Frameworks
Database
Hyperparameter TuningDistributed Training
Web
Versioning
Processing
CI / Testing
?
Resource Management
Testing & Deployment - Monitoring
Full Stack Deep Learning
53
Training System
- Process raw data

- Run experiments

- Manage results
Prediction System
- Process input data

- Construct networks with
trained weights

- Make predictions
Serving System

- Serve predictions

- Scale to demand
Training System
Tests

- Test full training pipeline

- Start from raw data

- Runs in < 1 day (nightly)

- Catches upstream
regressions
Validation Tests

- Test prediction system on
validation set

- Start from processed data

- Runs in < 1 hour (every
code push)

- Catches model
regressions
Functionality Tests

- Test prediction system on
a few important examples

- Runs in <5 minutes
(during development)

- Catches code regressions
Monitoring

- Alert to downtime,
errors, distribution
shifts

- Catches service and
data regressions
Training/Validation Data Production Data
Testing & Deployment - Monitoring
Full Stack Deep Learning
54
and then compare them to the data to avoid an anchoring
1 Feature expectations are captured in a schema.
2 All features are beneficial.
3 No feature’s cost is too much.
4 Features adhere to meta-level requirements.
5 The data pipeline has appropriate privacy controls.
6 New features can be added quickly.
7 All input feature code is tested.
Table I
BRIEF LISTING OF THE SEVEN DATA TESTS.
out a wide variety of potential features to improv
quality.
How? Programmatically enforce these requ
that all models in production properly adhere t
Data 5: The data pipeline has appropr
controls: Training data, validation data, and voc
all have the potential to contain sensitive user
teams often are aware of the need to remove per
tifiable information (PII), during this type of e
1https://pair-code.github.io/facets/
Data Tests
improve reproducibility.
1 Model specs are reviewed and submitted.
2 Offline and online metrics correlate.
3 All hyperparameters have been tuned.
4 The impact of model staleness is known.
5 A simpler model is not better.
6 Model quality is sufficient on important data slices.
7 The model is tested for considerations of inclusion.
Table II
BRIEF LISTING OF THE SEVEN MODEL TESTS
Model Tests
Table III
BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS
1 Training is reproducible.
2 Model specs are unit tested.
3 The ML pipeline is Integration tested.
4 Model quality is validated before serving.
5 The model is debuggable.
6 Models are canaried before serving.
7 Serving models can be rolled back.
country, users by frequency of use, or movies by genre.
Examining sliced data avoids having fine-grained quality
Infra 1: Training is reproducible: Idea
twice on the same data should produce two ide
els. Deterministic training dramatically simplifie
about the whole system and can aid auditability
ging. For example, optimizing feature generatio
delicate process but verifying that the old and
generation code will train to an identical model
more confidence that the refactoring was correc
of diff-testing relies entirely on deterministic tra
Unfortunately, model training is often not rep
practice, especially when working with non-conv
ML Infrastructure Tests
Table IV
BRIEF LISTING OF THE SEVEN MONITORING TESTS
1 Dependency changes result in notification.
2 Data invariants hold for inputs.
3 Training and serving are not skewed.
4 Models are not too stale.
5 Models are numerically stable.
6 Computing performance has not regressed.
7 Prediction quality has not regressed.
normally, when not in emergency conditions.
Monitoring Tests
Testing & Deployment - Monitoring
Full Stack Deep Learning
Monitoring
• Alarms for when things go
wrong, and records for tuning
things.

• Cloud providers have decent
monitoring solutions.

• Anything that can be logged
can be monitored (i.e. data
skew)

• Will explore in lab
55Testing & Deployment - Monitoring
Full Stack Deep Learning
Data Distribution Monitoring
• Underserved need!
56
Domino Data Lab
Full Stack Deep Learning
Closing the Flywheel
• Important to monitor the business uses of the model, not just its own
statistics

• Important to be able to contribute failures back to dataset
57Testing & Deployment - Monitoring
Full Stack Deep Learning
Closing the Flywheel
58Testing & Deployment - Monitoring
Full Stack Deep Learning 59Testing & Deployment - Monitoring
Closing the Flywheel
Full Stack Deep Learning 60Testing & Deployment - Monitoring
Closing the Flywheel
Full Stack Deep Learning 61Testing & Deployment - Monitoring
Closing the Flywheel
Full Stack Deep Learning
Questions?
62
Full Stack Deep Learning
“All-in-one”
Deployment
or or
Development Training/Evaluation
or
Data
or
Experiment Management
Labeling
Interchange
Storage
Monitoring
?
Hardware /
Mobile
Software Engineering
Frameworks
Database
Hyperparameter TuningDistributed Training
Web
Versioning
Processing
CI / Testing
?
Resource Management
Testing & Deployment - Hardware/Mobile
Full Stack Deep Learning
Problems
• Embedded and mobile frameworks are less fully featured than full
PyTorch/Tensorflow

• Have to be careful with architecture

• Interchange format

• Embedded and mobile devices have little memory and slow/expensive
compute

• Have to reduce network size / quantize weights / distill knowledge
64Testing & Deployment - Hardware/Mobile
Full Stack Deep Learning
Problems
• Embedded and mobile frameworks are less fully featured than full
PyTorch/Tensorflow

• Have to be careful with architecture

• Interchange format

• Embedded and mobile devices have little memory and slow/expensive
compute

• Have to reduce network size / quantize weights / distill knowledge
65Testing & Deployment - Hardware/Mobile
Full Stack Deep Learning
Tensorflow Lite
• Not all models will work, but much better than "Tensorflow Mobile" used
to be
66Testing & Deployment - Hardware/Mobile
Full Stack Deep Learning
PyTorch Mobile
• Optional quantization

• TorchScript: static execution graph

• Libraries for Android and iOS
67
Full Stack Deep Learning
68
https://heartbeat.fritz.ai/core-ml-vs-ml-kit-which-mobile-machine-learning-framework-is-right-for-you-e25c5d34c765
CoreML
- Released at Apple WWDC 2017

- Inference only

- https://coreml.store/
- Supposed to work with both
CoreML and MLKit
- Announced at Google I/O 2018

- Either via API or on-device

- Offers pre-trained models, or
can upload Tensorflow Lite
model
Testing & Deployment - Hardware/Mobile
Full Stack Deep Learning
Problems
69Testing & Deployment - Hardware/Mobile
• Embedded and mobile frameworks are less fully featured than full
PyTorch/Tensorflow

• Have to be careful with architecture

• Interchange format
• Embedded and mobile devices have little memory and slow/expensive
compute

• Have to reduce network size / quantize weights / distill knowledge
Full Stack Deep Learning
• Open Neural Network Exchange: open-source format for deep learning
models

• The dream is to mix different frameworks, such that frameworks that are
good for development (PyTorch) don’t also have to be good at inference
(Caffe2)
70Testing & Deployment - Hardware/Mobile
Interchange Format
Full Stack Deep Learning 71Testing & Deployment - Hardware/Mobile
Interchange Format
https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-onnx
Full Stack Deep Learning
Problems
• Embedded and mobile frameworks are less fully featured than full
PyTorch/Tensorflow

• Have to be careful with architecture

• Interchange format

• Embedded and mobile devices have little memory and slow/expensive
compute

• Have to reduce network size / quantize weights / distill knowledge
72Testing & Deployment - Hardware/Mobile
Full Stack Deep Learning
NVIDIA for Embedded
• Todo
73Testing & Deployment - Hardware/Mobile
Full Stack Deep Learning
Problems
74Testing & Deployment - Hardware/Mobile
• Embedded and mobile frameworks are less fully featured than full
PyTorch/Tensorflow

• Have to be careful with architecture

• Interchange format

• Embedded and mobile devices have little memory and slow/expensive
compute

• Have to reduce network size / quantize weights / distill knowledge
Full Stack Deep Learning
75
Standard
1x1
Inception
https://medium.com/@yu4u/why-mobilenet-and-its-variants-e-g-shufflenet-are-fast-1c7048b9618d
MobileNet
Testing & Deployment - Hardware/Mobile
MobileNets
Full Stack Deep Learning
76
https://medium.com/@yu4u/why-mobilenet-and-its-variants-e-g-shufflenet-are-fast-1c7048b9618d
Testing & Deployment - Hardware/Mobile
Full Stack Deep Learning
Problems
77Testing & Deployment - Hardware/Mobile
• Embedded and mobile frameworks are less fully featured than full
PyTorch/Tensorflow

• Have to be careful with architecture

• Interchange format

• Embedded and mobile devices have little memory and slow/expensive
compute

• Have to reduce network size / quantize weights / distill knowledge
Full Stack Deep Learning
Knowledge distillation
• First, train a large "teacher" network on the data directly

• Next, train a smaller "student" network to match the "teacher" network's
prediction distribution
78
Full Stack Deep Learning
Recommended case study: DistillBERT
79
https://medium.com/huggingface/distilbert-8cf3380435b5
Full Stack Deep Learning
Questions?
80
Full Stack Deep Learning
Thank you!
81

More Related Content

What's hot

An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!
Mansour Saffar
 
Feature selection
Feature selectionFeature selection
Feature selection
dkpawar
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
Liangjun Jiang
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
Fiza987241
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Databricks
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
Databricks
 
Pythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowPythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlow
Fernando Ortega Gallego
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
cnvrg.io AI OS - Hands-on ML Workshops
 
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Kai Wähner
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
David Talby
 
How to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-SourceHow to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-Source
Databricks
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
Databricks
 
FixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidenceFixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidence
LEE HOSEONG
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycle
Databricks
 
Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLP
Databricks
 
Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark
Herman Wu
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)
Hayim Makabee
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Mihai Criveti
 
MLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine Learning
Matei Zaharia
 

What's hot (20)

An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
 
Pythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowPythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlow
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
How to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-SourceHow to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-Source
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
FixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidenceFixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidence
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycle
 
Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLP
 
Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
 
MLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine Learning
 

Similar to Testing and Deployment - Full Stack Deep Learning

An Automation Framework That Really Works
An Automation Framework That Really WorksAn Automation Framework That Really Works
An Automation Framework That Really WorksBasivi Reddy Junna
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
QA Team Goes to Agile and Continuous integration
QA Team Goes to Agile and Continuous integrationQA Team Goes to Agile and Continuous integration
QA Team Goes to Agile and Continuous integration
Sujit Ghosh
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
Turi, Inc.
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
Patrick Van Renterghem
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
RTTS
 
10-Testing-system.pdf
10-Testing-system.pdf10-Testing-system.pdf
10-Testing-system.pdf
n190212
 
Batch Process Analytics
Batch Process Analytics Batch Process Analytics
Batch Process Analytics
Emerson Exchange
 
Software Testing
Software TestingSoftware Testing
Software Testing
SKumar11384
 
vodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applicationsvodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applications
vodQA
 
Software Risk Analysis
Software Risk AnalysisSoftware Risk Analysis
Software Risk Analysis
Brett Leonard
 
Building functional Quality Gates with ReportPortal
Building functional Quality Gates with ReportPortalBuilding functional Quality Gates with ReportPortal
Building functional Quality Gates with ReportPortal
Dmitriy Gumeniuk
 
Test Automation NYC 2014
Test Automation NYC 2014Test Automation NYC 2014
Test Automation NYC 2014
Kishore Bhatia
 
Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL Testing
Cognizant
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
Databricks
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
Avinash Patil
 
Gcs day1
Gcs day1Gcs day1
Gcs day1
Sriram Angajala
 

Similar to Testing and Deployment - Full Stack Deep Learning (20)

An Automation Framework That Really Works
An Automation Framework That Really WorksAn Automation Framework That Really Works
An Automation Framework That Really Works
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
QA Team Goes to Agile and Continuous integration
QA Team Goes to Agile and Continuous integrationQA Team Goes to Agile and Continuous integration
QA Team Goes to Agile and Continuous integration
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 
Manualtestingppt
ManualtestingpptManualtestingppt
Manualtestingppt
 
Introduction & Manual Testing
Introduction & Manual TestingIntroduction & Manual Testing
Introduction & Manual Testing
 
10-Testing-system.pdf
10-Testing-system.pdf10-Testing-system.pdf
10-Testing-system.pdf
 
Batch Process Analytics
Batch Process Analytics Batch Process Analytics
Batch Process Analytics
 
Software Testing
Software TestingSoftware Testing
Software Testing
 
vodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applicationsvodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applications
 
Software Risk Analysis
Software Risk AnalysisSoftware Risk Analysis
Software Risk Analysis
 
Building functional Quality Gates with ReportPortal
Building functional Quality Gates with ReportPortalBuilding functional Quality Gates with ReportPortal
Building functional Quality Gates with ReportPortal
 
Test Automation NYC 2014
Test Automation NYC 2014Test Automation NYC 2014
Test Automation NYC 2014
 
Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL Testing
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
Gcs day1
Gcs day1Gcs day1
Gcs day1
 

More from Sergey Karayev

Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Sergey Karayev
 
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Sergey Karayev
 
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Sergey Karayev
 
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Sergey Karayev
 
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Sergey Karayev
 
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Sergey Karayev
 
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Sergey Karayev
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Sergey Karayev
 
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Sergey Karayev
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Sergey Karayev
 
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Sergey Karayev
 
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
Sergey Karayev
 
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
Sergey Karayev
 
Research Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningResearch Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep Learning
Sergey Karayev
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
Sergey Karayev
 
AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019
Sergey Karayev
 
Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Sergey Karayev
 

More from Sergey Karayev (17)

Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
 
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
 
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
 
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
 
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
 
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
 
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
 
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
 
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
 
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
 
Research Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningResearch Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep Learning
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
 
AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019
 
Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.
 

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 

Testing and Deployment - Full Stack Deep Learning

  • 1. Full Stack Deep Learning Josh Tobin, Sergey Karayev, Pieter Abbeel Testing & Deployment
  • 2. Full Stack Deep Learning “All-in-one” Deployment or or Development Training/Evaluation or Data or Experiment Management Labeling Interchange Storage Monitoring ? Hardware / Mobile Software Engineering Frameworks Database Hyperparameter TuningDistributed Training Web Versioning Processing CI / Testing ? Resource Management Testing & Deployment - Overview
  • 3. Full Stack Deep Learning Module Overview • Concepts • Project structure • ML Test Score: a rubric for production readiness • Infrastructure/Tooling • CI/Testing • Docker in more depth • Deploying to the web • Monitoring • Deploying to hardware/mobile 3
  • 4. Full Stack Deep Learning 4 Training System - Process raw data - Run experiments - Manage results Prediction System - Process input data - Construct networks with trained weights - Make predictions Serving System
 - Serve predictions - Scale to demand Training System Tests
 - Test full training pipeline - Start from raw data - Runs in < 1 day (nightly) - Catches upstream regressions Validation Tests
 - Test prediction system on validation set - Start from processed data - Runs in < 1 hour (every code push) - Catches model regressions Functionality Tests
 - Test prediction system on a few important examples - Runs in <5 minutes (during development) - Catches code regressions Monitoring
 - Alert to downtime, errors, distribution shifts - Catches service and data regressions Training/Validation Data Production Data Testing & Deployment - Project Structure
  • 5. Full Stack Deep Learning 5 The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, D. Sculley Google, Inc. ebreck, cais, nielsene, msalib, dsculley@google.com bstract—Creating reliable, production-level machine learn- systems brings on a host of concerns not found in ll toy examples or even large offline research experiments. ing and monitoring are key considerations for ensuring production-readiness of an ML system, and for reducing nical debt of ML systems. But it can be difficult to formu- specific tests, given that the actual prediction behavior of given model is difficult to specify a priori. In this paper, present 28 specific tests and monitoring needs, drawn from may find difficult. Note that this rubric focuses on is specific to ML systems, and so does not include gen software engineering best practices such as ensuring g unit test coverage and a well-defined binary release proc Such strategies remain necessary as well. We do call a few specific areas for unit or integration tests that h unique ML-related behavior. How to read the tests: Each test is written as IEEE Big Data 2017 An exhaustive framework/checklist from practitioners at Google Testing & Deployment - ML Test Score
  • 6. Full Stack Deep Learning 6 Follow-up to previous work
 from Google Testing & Deployment - ML Test Score
  • 7. Full Stack Deep Learning 7 ML Systems Require Extensive Testing and Monitoring. The key consideration is that unlike a manually coded system (left), avior is not easily specified in advance. This behavior depends on dynamic qualities of the data, and on various model configuration d with determining how reliable an ML system is an how to build one. of surprising sources of technical debt in ML bias. Visualization tools such as Facets1 can be ver for analyzing the data to produce the schema. Invar capture in a schema can also be inferred automatica Traditional Software ML Systems Require Extensive Testing and Monitoring. The key consideration is that unlike a manually coded system ehavior is not easily specified in advance. This behavior depends on dynamic qualities of the data, and on various model config ed with determining how reliable an ML system is han how to build one. s of surprising sources of technical debt in ML bias. Visualization tools such as Facets1 can b for analyzing the data to produce the schema capture in a schema can also be inferred autom Machine Learning Software Training System Prediction System Serving System
 Data Testing & Deployment - ML Test Score
  • 8. Full Stack Deep Learning 8and then compare them to the data to avoid an anchoring 1 Feature expectations are captured in a schema. 2 All features are beneficial. 3 No feature’s cost is too much. 4 Features adhere to meta-level requirements. 5 The data pipeline has appropriate privacy controls. 6 New features can be added quickly. 7 All input feature code is tested. Table I BRIEF LISTING OF THE SEVEN DATA TESTS. out a wide variety of potential features to improv quality. How? Programmatically enforce these requ that all models in production properly adhere t Data 5: The data pipeline has appropr controls: Training data, validation data, and voc all have the potential to contain sensitive user teams often are aware of the need to remove per tifiable information (PII), during this type of e 1https://pair-code.github.io/facets/ Data Tests Table III BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS 1 Training is reproducible. 2 Model specs are unit tested. 3 The ML pipeline is Integration tested. 4 Model quality is validated before serving. 5 The model is debuggable. 6 Models are canaried before serving. 7 Serving models can be rolled back. country, users by frequency of use, or movies by genre. Infra 1: Training is reproducible: Idea twice on the same data should produce two ide els. Deterministic training dramatically simplifie about the whole system and can aid auditability ging. For example, optimizing feature generatio delicate process but verifying that the old and generation code will train to an identical model more confidence that the refactoring was correc of diff-testing relies entirely on deterministic tra Unfortunately, model training is often not rep practice, especially when working with non-conv ML Infrastructure Tests the model specification can help make training auditabl improve reproducibility. 1 Model specs are reviewed and submitted. 2 Offline and online metrics correlate. 3 All hyperparameters have been tuned. 4 The impact of model staleness is known. 5 A simpler model is not better. 6 Model quality is sufficient on important data slices. 7 The model is tested for considerations of inclusion. Table II BRIEF LISTING OF THE SEVEN MODEL TESTS Model Tests Table IV BRIEF LISTING OF THE SEVEN MONITORING TESTS 1 Dependency changes result in notification. 2 Data invariants hold for inputs. 3 Training and serving are not skewed. 4 Models are not too stale. 5 Models are numerically stable. 6 Computing performance has not regressed. 7 Prediction quality has not regressed. normally, when not in emergency conditions. Monitoring Tests Testing & Deployment - ML Test Score
  • 9. Full Stack Deep Learning Scoring the Test Points Description 0 More of a research project than a productionized system. (0,1] Not totally untested, but it is worth considering the possibility of serious holes in reliability. (1,2] There’s been first pass at basic productionization, but additional investment may be needed. (2,3] Reasonably tested, but it’s possible that more of those tests and procedures may be automated. (3,5] Strong levels of automated testing and monitoring, appropriate for mission-critical systems. > 5 Exceptional levels of automated testing and monitoring. Table V L Test Score. THIS SCORE IS COMPUTED BY TAKING THE minimum SCORE FROM EACH OF THE FOUR TEST A MS AT DIFFERENT POINTS IN THEIR DEVELOPMENT MAY REASONABLY AIM TO BE AT DIFFERENT POINTS AL worth the same number of points. This is numbers”). When we asked if they had d9Testing & Deployment - ML Test Score
  • 10. Full Stack Deep LearningFigure 4. Average scores for interviewed teams. These graphs display the average score for each test across the 36 systems we examined. • 36 teams with ML systems in production at Google Results at Google 10Testing & Deployment - ML Test Score
  • 11. Full Stack Deep Learning Questions? 11
  • 12. Full Stack Deep Learning “All-in-one” Deployment or or Development Training/Evaluation or Data or Experiment Management Labeling Interchange Storage Monitoring ? Hardware / Mobile Software Engineering Frameworks Database Hyperparameter TuningDistributed Training Web Versioning Processing CI / Testing ? Resource Management Testing & Deployment - CI/Testing
  • 13. Full Stack Deep Learning Testing / CI • Unit / Integration Tests • Tests for individual module functionality and for the whole system • Continuous Integration • Tests are run every time new code is pushed to the repository, before updated model is deployed. • SaaS for CI • CircleCI, Travis, Jenkins, Buildkite • Containerization (via Docker) • A self-enclosed environment for running the tests 13Testing & Deployment - CI/Testing
  • 14. Full Stack Deep Learning 14 Training System - Process raw data - Run experiments - Manage results Prediction System - Process input data - Construct networks with trained weights - Make predictions Serving System
 - Serve predictions - Scale to demand Training System Tests
 - Test full training pipeline - Start from raw data - Runs in < 1 day (nightly) - Catches upstream regressions Validation Tests
 - Test prediction system on validation set - Start from processed data - Runs in < 1 hour (every code push) - Catches model regressions Functionality Tests
 - Test prediction system on a few important examples - Runs in <5 minutes (during development) - Catches code regressions Monitoring
 - Alert to downtime, errors, distribution shifts - Catches service and data regressions Training/Validation Data Production Data Testing & Deployment - CI/Testing
  • 15. Full Stack Deep Learning 15 and then compare them to the data to avoid an anchoring 1 Feature expectations are captured in a schema. 2 All features are beneficial. 3 No feature’s cost is too much. 4 Features adhere to meta-level requirements. 5 The data pipeline has appropriate privacy controls. 6 New features can be added quickly. 7 All input feature code is tested. Table I BRIEF LISTING OF THE SEVEN DATA TESTS. out a wide variety of potential features to improv quality. How? Programmatically enforce these requ that all models in production properly adhere t Data 5: The data pipeline has appropr controls: Training data, validation data, and voc all have the potential to contain sensitive user teams often are aware of the need to remove per tifiable information (PII), during this type of e 1https://pair-code.github.io/facets/ Data Tests improve reproducibility. 1 Model specs are reviewed and submitted. 2 Offline and online metrics correlate. 3 All hyperparameters have been tuned. 4 The impact of model staleness is known. 5 A simpler model is not better. 6 Model quality is sufficient on important data slices. 7 The model is tested for considerations of inclusion. Table II BRIEF LISTING OF THE SEVEN MODEL TESTS Model Tests Table III BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS 1 Training is reproducible. 2 Model specs are unit tested. 3 The ML pipeline is Integration tested. 4 Model quality is validated before serving. 5 The model is debuggable. 6 Models are canaried before serving. 7 Serving models can be rolled back. country, users by frequency of use, or movies by genre. Examining sliced data avoids having fine-grained quality Infra 1: Training is reproducible: Idea twice on the same data should produce two ide els. Deterministic training dramatically simplifie about the whole system and can aid auditability ging. For example, optimizing feature generatio delicate process but verifying that the old and generation code will train to an identical model more confidence that the refactoring was correc of diff-testing relies entirely on deterministic tra Unfortunately, model training is often not rep practice, especially when working with non-conv ML Infrastructure Tests Table IV BRIEF LISTING OF THE SEVEN MONITORING TESTS 1 Dependency changes result in notification. 2 Data invariants hold for inputs. 3 Training and serving are not skewed. 4 Models are not too stale. 5 Models are numerically stable. 6 Computing performance has not regressed. 7 Prediction quality has not regressed. normally, when not in emergency conditions. Monitoring Tests Testing & Deployment - CI/Testing
  • 16. Full Stack Deep Learning CircleCI / Travis / etc • SaaS for continuous integration • Integrate with your repository: every push kicks off a cloud job • Job can be defined as commands in a Docker container, and can store results for later review • No GPUs • CircleCI has a free plan 16Testing & Deployment - CI/Testing
  • 17. Full Stack Deep Learning Jenkins / Buildkite • Nice option for running CI on your own hardware, in the cloud, or mixed • Good option for a scheduled training test (can use your GPUs) • Very flexible 17
  • 18. Full Stack Deep Learning Docker • Before proceeding, we have to understand Docker better. • Docker vs VM • Dockerfile and layer-based images • DockerHub and the ecosystem • Kubernetes and other orchestrators 18Testing & Deployment - Docker
  • 19. Full Stack Deep Learning 19 https://medium.freecodecamp.org/a-beginner-friendly-introduction-to-containers-vms-and-docker-79a9e3e119b No OS -> Light weight Testing & Deployment - Docker
  • 20. Full Stack Deep Learning • Spin up a container for every discrete task • For example, a web app might have four containers: • Web server • Database • Job queue • Worker 20 https://www.docker.com/what-container Light weight -> Heavy use Testing & Deployment - Docker
  • 21. Full Stack Deep Learning Dockerfile 21Testing & Deployment - Docker
  • 22. Full Stack Deep Learning Strong Ecosystem • Images are easy to find, modify, and contribute back to DockerHub • Private images easy to store in same place 22 https://docs.docker.com/engine/docker-overview Testing & Deployment - Docker
  • 23. Full Stack Deep Learning 23 https://www.docker.com/what-container#/package_software Testing & Deployment - Docker
  • 24. Full Stack Deep Learning Container Orchestration • Distributing containers onto underlying VMs or bare metal • Kubernetes is the open-source winner, but cloud providers have good offerings, too. 24Testing & Deployment - Docker
  • 25. Full Stack Deep Learning Questions? 25
  • 26. Full Stack Deep Learning “All-in-one” Deployment or or Development Training/Evaluation or Data or Experiment Management Labeling Interchange Storage Monitoring ? Hardware / Mobile Software Engineering Frameworks Database Hyperparameter TuningDistributed Training Web Versioning Processing CI / Testing ? Resource Management Testing & Deployment - Web Deployment
  • 27. Full Stack Deep Learning Terminology • Inference: making predictions (not training) • Model Serving: ML-specific deployment, usually focused on optimizing GPU usage 27Testing & Deployment - Web Deployment
  • 28. Full Stack Deep Learning Web Deployment • REST API • Serving predictions in response to canonically-formatted HTTP requests • The web server is running and calling the prediction system • Options • Deploying code to VMs, scale by adding instances • Deploy code as containers, scale via orchestration • Deploy code as a “serverless function” • Deploy via a model serving solution 28Testing & Deployment - Web Deployment
  • 29. Full Stack Deep Learning 29 Training System - Process raw data - Run experiments - Manage results Prediction System - Process input data - Construct networks with trained weights - Make predictions Serving System
 - Serve predictions - Scale to demand Training System Tests
 - Test full training pipeline - Start from raw data - Runs in < 1 day (nightly) - Catches upstream regressions Validation Tests
 - Test prediction system on validation set - Start from processed data - Runs in < 1 hour (every code push) - Catches model regressions Functionality Tests
 - Test prediction system on a few important examples - Runs in <5 minutes (during development) - Catches code regressions Monitoring
 - Alert to downtime, errors, distribution shifts - Catches service and data regressions Training/Validation Data Production Data Testing & Deployment - Web Deployment
  • 30. Full Stack Deep Learning 30 and then compare them to the data to avoid an anchoring 1 Feature expectations are captured in a schema. 2 All features are beneficial. 3 No feature’s cost is too much. 4 Features adhere to meta-level requirements. 5 The data pipeline has appropriate privacy controls. 6 New features can be added quickly. 7 All input feature code is tested. Table I BRIEF LISTING OF THE SEVEN DATA TESTS. out a wide variety of potential features to improv quality. How? Programmatically enforce these requ that all models in production properly adhere t Data 5: The data pipeline has appropr controls: Training data, validation data, and voc all have the potential to contain sensitive user teams often are aware of the need to remove per tifiable information (PII), during this type of e 1https://pair-code.github.io/facets/ Data Tests improve reproducibility. 1 Model specs are reviewed and submitted. 2 Offline and online metrics correlate. 3 All hyperparameters have been tuned. 4 The impact of model staleness is known. 5 A simpler model is not better. 6 Model quality is sufficient on important data slices. 7 The model is tested for considerations of inclusion. Table II BRIEF LISTING OF THE SEVEN MODEL TESTS Model Tests Table III BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS 1 Training is reproducible. 2 Model specs are unit tested. 3 The ML pipeline is Integration tested. 4 Model quality is validated before serving. 5 The model is debuggable. 6 Models are canaried before serving. 7 Serving models can be rolled back. country, users by frequency of use, or movies by genre. Examining sliced data avoids having fine-grained quality Infra 1: Training is reproducible: Idea twice on the same data should produce two ide els. Deterministic training dramatically simplifie about the whole system and can aid auditability ging. For example, optimizing feature generatio delicate process but verifying that the old and generation code will train to an identical model more confidence that the refactoring was correc of diff-testing relies entirely on deterministic tra Unfortunately, model training is often not rep practice, especially when working with non-conv ML Infrastructure Tests Table IV BRIEF LISTING OF THE SEVEN MONITORING TESTS 1 Dependency changes result in notification. 2 Data invariants hold for inputs. 3 Training and serving are not skewed. 4 Models are not too stale. 5 Models are numerically stable. 6 Computing performance has not regressed. 7 Prediction quality has not regressed. normally, when not in emergency conditions. Monitoring Tests Testing & Deployment - Web Deployment
  • 31. Full Stack Deep Learning 31 DEPLOY CODE TO YOUR OWN BARE METAL Testing & Deployment - Web Deployment
  • 32. Full Stack Deep Learning 32 DEPLOY CODE TO YOUR OWN BARE METAL DEPLOY CODE TO CLOUD INSTANCES Testing & Deployment - Web Deployment
  • 33. Full Stack Deep Learning Deploying code to cloud instances • Each instance has to be provisioned with dependencies and app code • Number of instances is managed manually or via auto-scaling • Load balancer sends user traffic to the instances 33
  • 34. Full Stack Deep Learning Deploying code to cloud instances • Cons: • Provisioning can be brittle • Paying for instances even when not using them (auto-scaling does help) 34Testing & Deployment - Web Deployment
  • 35. Full Stack Deep Learning 35 DEPLOY DOCKER CONTAINERS TO
 CLOUD INSTANCES DEPLOY CODE TO YOUR OWN BARE METAL DEPLOY CODE TO CLOUD INSTANCES Testing & Deployment - Web Deployment
  • 36. Full Stack Deep Learning Deploying containers • App code and dependencies are packaged into Docker containers • Kubernetes or alternative (AWS Fargate) orchestrates containers (DB / workers / web servers / etc. 36Testing & Deployment - Web Deployment
  • 37. Full Stack Deep Learning Deploying containers • Cons: • Still managing your own servers and paying for uptime, not compute- time 37Testing & Deployment - Web Deployment
  • 38. Full Stack Deep Learning 38 DEPLOY CODE TO YOUR OWN BARE METAL DEPLOY CODE TO CLOUD INSTANCES DEPLOY DOCKER CONTAINERS TO
 CLOUD INSTANCES DEPLOY SERVERLESS FUNCTIONS Testing & Deployment - Web Deployment
  • 39. Full Stack Deep Learning Deploying code as serverless functions • App code and dependencies are packaged into .zip files, with a single entry point function • AWS Lambda (or Google Cloud Functions, or Azure Functions) manages everything else: instant scaling to 10,000+ requests per second, load balancing, etc. • Only pay for compute-time. 39Testing & Deployment - Web Deployment
  • 40. Full Stack Deep Learning 40 Testing & Deployment - Web Deployment
  • 41. Full Stack Deep Learning Deploying code as serverless functions • Cons: • Entire deployment package has to fit within 500MB, <5 min execution, <3GB memory (on AWS Lambda) • Only CPU execution 41Testing & Deployment - Web Deployment
  • 42. Full Stack Deep Learning 42 and then compare them to the data to avoid an anchoring 1 Feature expectations are captured in a schema. 2 All features are beneficial. 3 No feature’s cost is too much. 4 Features adhere to meta-level requirements. 5 The data pipeline has appropriate privacy controls. 6 New features can be added quickly. 7 All input feature code is tested. Table I BRIEF LISTING OF THE SEVEN DATA TESTS. out a wide variety of potential features to improv quality. How? Programmatically enforce these requ that all models in production properly adhere t Data 5: The data pipeline has appropr controls: Training data, validation data, and voc all have the potential to contain sensitive user teams often are aware of the need to remove per tifiable information (PII), during this type of e 1https://pair-code.github.io/facets/ Data Tests improve reproducibility. 1 Model specs are reviewed and submitted. 2 Offline and online metrics correlate. 3 All hyperparameters have been tuned. 4 The impact of model staleness is known. 5 A simpler model is not better. 6 Model quality is sufficient on important data slices. 7 The model is tested for considerations of inclusion. Table II BRIEF LISTING OF THE SEVEN MODEL TESTS Model Tests Table III BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS 1 Training is reproducible. 2 Model specs are unit tested. 3 The ML pipeline is Integration tested. 4 Model quality is validated before serving. 5 The model is debuggable. 6 Models are canaried before serving. 7 Serving models can be rolled back. country, users by frequency of use, or movies by genre. Examining sliced data avoids having fine-grained quality Infra 1: Training is reproducible: Idea twice on the same data should produce two ide els. Deterministic training dramatically simplifie about the whole system and can aid auditability ging. For example, optimizing feature generatio delicate process but verifying that the old and generation code will train to an identical model more confidence that the refactoring was correc of diff-testing relies entirely on deterministic tra Unfortunately, model training is often not rep practice, especially when working with non-conv ML Infrastructure Tests Table IV BRIEF LISTING OF THE SEVEN MONITORING TESTS 1 Dependency changes result in notification. 2 Data invariants hold for inputs. 3 Training and serving are not skewed. 4 Models are not too stale. 5 Models are numerically stable. 6 Computing performance has not regressed. 7 Prediction quality has not regressed. normally, when not in emergency conditions. Monitoring Tests Testing & Deployment - Web Deployment
  • 43. Full Stack Deep Learning Deploying code as serverless functions • Canarying • Easy to have two versions of Lambda functions in production, and start sending low volume traffic to one. • Rollback • Easy to switch back to the previous version of the function. 43Testing & Deployment - Web Deployment
  • 44. Full Stack Deep Learning Model Serving • Web deployment options specialized for machine learning models • Tensorflow Serving (Google) • Model Server for MXNet (Amazon) • Clipper (Berkeley RISE Lab) • SaaS solutions like Algorithmia • The most important feature is batching requests for GPU inference 44Testing & Deployment - Web Deployment
  • 45. Full Stack Deep Learning Tensorflow Serving • Able to serve Tensorflow predictions at high throughput. • Used by Google and their “all-in-one” machine learning offerings • Overkill unless you know you need to use GPU for inference 45 Figure 2: TFS2 (hosted model serving) architecture. 2.2.1 Inter-Request Batching
  • 46. Full Stack Deep Learning MXNet Model Server • Amazon’s answer to Tensorflow Serving • Part of their SageMaker “all-in-one” machine learning offering. 46
  • 47. Full Stack Deep Learning Clipper • Open-source model serving via REST, using Docker. • Nice-to-haves on top of the basics • e.g “state-of-the-art bandit and ensemble methods to intelligently select and combine predictions” • Probably too early to be actually useful (unless you know why you need it) 47
  • 48. Full Stack Deep Learning Algorithmia • Some startups promise effortless model serving 48
  • 49. Full Stack Deep Learning Seldon 49 https://www.seldon.io/tech/products/core/ Testing & Deployment - Web Deployment
  • 50. Full Stack Deep Learning Takeaway • If you are doing CPU inference, can get away with scaling by launching more servers, or going serverless. • The dream is deploying Docker as easily as Lambda • Next best thing is either Lambda or Docker, depending on your model’s needs. • If using GPU inference, things like TF Serving and Clipper become useful by adaptive batching, etc. 50Testing & Deployment - Web Deployment
  • 51. Full Stack Deep Learning Questions? 51
  • 52. Full Stack Deep Learning “All-in-one” Deployment or or Development Training/Evaluation or Data or Experiment Management Labeling Interchange Storage Monitoring ? Hardware / Mobile Software Engineering Frameworks Database Hyperparameter TuningDistributed Training Web Versioning Processing CI / Testing ? Resource Management Testing & Deployment - Monitoring
  • 53. Full Stack Deep Learning 53 Training System - Process raw data - Run experiments - Manage results Prediction System - Process input data - Construct networks with trained weights - Make predictions Serving System
 - Serve predictions - Scale to demand Training System Tests
 - Test full training pipeline - Start from raw data - Runs in < 1 day (nightly) - Catches upstream regressions Validation Tests
 - Test prediction system on validation set - Start from processed data - Runs in < 1 hour (every code push) - Catches model regressions Functionality Tests
 - Test prediction system on a few important examples - Runs in <5 minutes (during development) - Catches code regressions Monitoring
 - Alert to downtime, errors, distribution shifts - Catches service and data regressions Training/Validation Data Production Data Testing & Deployment - Monitoring
  • 54. Full Stack Deep Learning 54 and then compare them to the data to avoid an anchoring 1 Feature expectations are captured in a schema. 2 All features are beneficial. 3 No feature’s cost is too much. 4 Features adhere to meta-level requirements. 5 The data pipeline has appropriate privacy controls. 6 New features can be added quickly. 7 All input feature code is tested. Table I BRIEF LISTING OF THE SEVEN DATA TESTS. out a wide variety of potential features to improv quality. How? Programmatically enforce these requ that all models in production properly adhere t Data 5: The data pipeline has appropr controls: Training data, validation data, and voc all have the potential to contain sensitive user teams often are aware of the need to remove per tifiable information (PII), during this type of e 1https://pair-code.github.io/facets/ Data Tests improve reproducibility. 1 Model specs are reviewed and submitted. 2 Offline and online metrics correlate. 3 All hyperparameters have been tuned. 4 The impact of model staleness is known. 5 A simpler model is not better. 6 Model quality is sufficient on important data slices. 7 The model is tested for considerations of inclusion. Table II BRIEF LISTING OF THE SEVEN MODEL TESTS Model Tests Table III BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS 1 Training is reproducible. 2 Model specs are unit tested. 3 The ML pipeline is Integration tested. 4 Model quality is validated before serving. 5 The model is debuggable. 6 Models are canaried before serving. 7 Serving models can be rolled back. country, users by frequency of use, or movies by genre. Examining sliced data avoids having fine-grained quality Infra 1: Training is reproducible: Idea twice on the same data should produce two ide els. Deterministic training dramatically simplifie about the whole system and can aid auditability ging. For example, optimizing feature generatio delicate process but verifying that the old and generation code will train to an identical model more confidence that the refactoring was correc of diff-testing relies entirely on deterministic tra Unfortunately, model training is often not rep practice, especially when working with non-conv ML Infrastructure Tests Table IV BRIEF LISTING OF THE SEVEN MONITORING TESTS 1 Dependency changes result in notification. 2 Data invariants hold for inputs. 3 Training and serving are not skewed. 4 Models are not too stale. 5 Models are numerically stable. 6 Computing performance has not regressed. 7 Prediction quality has not regressed. normally, when not in emergency conditions. Monitoring Tests Testing & Deployment - Monitoring
  • 55. Full Stack Deep Learning Monitoring • Alarms for when things go wrong, and records for tuning things. • Cloud providers have decent monitoring solutions. • Anything that can be logged can be monitored (i.e. data skew) • Will explore in lab 55Testing & Deployment - Monitoring
  • 56. Full Stack Deep Learning Data Distribution Monitoring • Underserved need! 56 Domino Data Lab
  • 57. Full Stack Deep Learning Closing the Flywheel • Important to monitor the business uses of the model, not just its own statistics • Important to be able to contribute failures back to dataset 57Testing & Deployment - Monitoring
  • 58. Full Stack Deep Learning Closing the Flywheel 58Testing & Deployment - Monitoring
  • 59. Full Stack Deep Learning 59Testing & Deployment - Monitoring Closing the Flywheel
  • 60. Full Stack Deep Learning 60Testing & Deployment - Monitoring Closing the Flywheel
  • 61. Full Stack Deep Learning 61Testing & Deployment - Monitoring Closing the Flywheel
  • 62. Full Stack Deep Learning Questions? 62
  • 63. Full Stack Deep Learning “All-in-one” Deployment or or Development Training/Evaluation or Data or Experiment Management Labeling Interchange Storage Monitoring ? Hardware / Mobile Software Engineering Frameworks Database Hyperparameter TuningDistributed Training Web Versioning Processing CI / Testing ? Resource Management Testing & Deployment - Hardware/Mobile
  • 64. Full Stack Deep Learning Problems • Embedded and mobile frameworks are less fully featured than full PyTorch/Tensorflow • Have to be careful with architecture • Interchange format • Embedded and mobile devices have little memory and slow/expensive compute • Have to reduce network size / quantize weights / distill knowledge 64Testing & Deployment - Hardware/Mobile
  • 65. Full Stack Deep Learning Problems • Embedded and mobile frameworks are less fully featured than full PyTorch/Tensorflow • Have to be careful with architecture • Interchange format • Embedded and mobile devices have little memory and slow/expensive compute • Have to reduce network size / quantize weights / distill knowledge 65Testing & Deployment - Hardware/Mobile
  • 66. Full Stack Deep Learning Tensorflow Lite • Not all models will work, but much better than "Tensorflow Mobile" used to be 66Testing & Deployment - Hardware/Mobile
  • 67. Full Stack Deep Learning PyTorch Mobile • Optional quantization • TorchScript: static execution graph • Libraries for Android and iOS 67
  • 68. Full Stack Deep Learning 68 https://heartbeat.fritz.ai/core-ml-vs-ml-kit-which-mobile-machine-learning-framework-is-right-for-you-e25c5d34c765 CoreML - Released at Apple WWDC 2017 - Inference only - https://coreml.store/ - Supposed to work with both CoreML and MLKit - Announced at Google I/O 2018 - Either via API or on-device - Offers pre-trained models, or can upload Tensorflow Lite model Testing & Deployment - Hardware/Mobile
  • 69. Full Stack Deep Learning Problems 69Testing & Deployment - Hardware/Mobile • Embedded and mobile frameworks are less fully featured than full PyTorch/Tensorflow • Have to be careful with architecture • Interchange format • Embedded and mobile devices have little memory and slow/expensive compute • Have to reduce network size / quantize weights / distill knowledge
  • 70. Full Stack Deep Learning • Open Neural Network Exchange: open-source format for deep learning models • The dream is to mix different frameworks, such that frameworks that are good for development (PyTorch) don’t also have to be good at inference (Caffe2) 70Testing & Deployment - Hardware/Mobile Interchange Format
  • 71. Full Stack Deep Learning 71Testing & Deployment - Hardware/Mobile Interchange Format https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-onnx
  • 72. Full Stack Deep Learning Problems • Embedded and mobile frameworks are less fully featured than full PyTorch/Tensorflow • Have to be careful with architecture • Interchange format • Embedded and mobile devices have little memory and slow/expensive compute • Have to reduce network size / quantize weights / distill knowledge 72Testing & Deployment - Hardware/Mobile
  • 73. Full Stack Deep Learning NVIDIA for Embedded • Todo 73Testing & Deployment - Hardware/Mobile
  • 74. Full Stack Deep Learning Problems 74Testing & Deployment - Hardware/Mobile • Embedded and mobile frameworks are less fully featured than full PyTorch/Tensorflow • Have to be careful with architecture • Interchange format • Embedded and mobile devices have little memory and slow/expensive compute • Have to reduce network size / quantize weights / distill knowledge
  • 75. Full Stack Deep Learning 75 Standard 1x1 Inception https://medium.com/@yu4u/why-mobilenet-and-its-variants-e-g-shufflenet-are-fast-1c7048b9618d MobileNet Testing & Deployment - Hardware/Mobile MobileNets
  • 76. Full Stack Deep Learning 76 https://medium.com/@yu4u/why-mobilenet-and-its-variants-e-g-shufflenet-are-fast-1c7048b9618d Testing & Deployment - Hardware/Mobile
  • 77. Full Stack Deep Learning Problems 77Testing & Deployment - Hardware/Mobile • Embedded and mobile frameworks are less fully featured than full PyTorch/Tensorflow • Have to be careful with architecture • Interchange format • Embedded and mobile devices have little memory and slow/expensive compute • Have to reduce network size / quantize weights / distill knowledge
  • 78. Full Stack Deep Learning Knowledge distillation • First, train a large "teacher" network on the data directly • Next, train a smaller "student" network to match the "teacher" network's prediction distribution 78
  • 79. Full Stack Deep Learning Recommended case study: DistillBERT 79 https://medium.com/huggingface/distilbert-8cf3380435b5
  • 80. Full Stack Deep Learning Questions? 80
  • 81. Full Stack Deep Learning Thank you! 81