10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions

10 Things I Wish I Had Known
Before Scaling Deep Learning
Solutions
Invector Labs

About Invector Labs
• Platform for top-class computer science talent
• Uses artificial intelligence to connect enterprises with top freelance
talent around the world
• Focused on deep tech
• Artificial intelligence
• Blockchain technologies
• Internet of things
• Cybersecurity
• Advanced cloud computing
• ….
• http://invectorlabs.com

Agenda
• Realities of scaling deep learning solutions
• 10 Lessons
• Challenge
• What we learned?
• Solution

Lessons from the Real World
•Using deep learning to analyze reviews form over 40 travel websites
•12 different deep learning models
•Scenarios: topic extraction, sentiment analysis, price predictions, hotel scoring….
•Techniques: Natural language processing, NLP micro understanding, clustering, time series analysis
Large Hospitality
Group
•Using deep learning to extract intelligence from trial discovery documents and legal research
•Scenarios: Natural language search, knowledge extraction, document relationships, research recommendations, strategy simulation
•Techniques: Convolutional neural networks, generative models, recurrent neural networks, natural language processing….
Legal Software
Platform Vendor
•Using deep learning to analyze cargo information and sensor data
•Scenarios: Car load predictions, part maintenance prediction, track video analysis
•Techniques: Convolutional neural networks, recurrent neural networks, transfer learning, predictive modeling, linear regressions….
International
Railway Company
•Using deep learning to simulate trading strategies
•Scenarios: Portfolio rebalancing, option pricing, daily stock selection, strategy selection
•Techniques: Reinforcement learning, transfer learning, predictive modeling
Quant Hedge Fund

Key Takeaways
• Implementing deep solutions at scale imposes new infrastructure
challenges
• Deep learning requires a new type of architecture

Deep Learning
• Deep learning is a subset of machine learning.
• Uses a hierarchy of multiple layers of nonlinear processing units for
feature extraction and transformation. Each successive layer uses the
output from the previous layer as input.
• Learns in supervised (e.g., classification) and/or unsupervised (e.g.,
pattern analysis) manners.
• Learns multiple levels of representations that correspond to different
levels of abstraction; the levels form a hierarchy of concepts.

Deep Learning Sub-Disciplines
Deep
Learning
Convolutional
Neural
Networks
Recurrent
Neural
Networks
Adversarial
Neural
Networks
Reinforcement
Learning
Generative
Models
Transfer
Learning
….

What Makes Deep Learning so Challenging?
Curse of Dimensionality
• Models with millions of nodes
Over/Under Fitting
• Models too tailored to the datasets
Interpretability
• Understanding complex network structures
Bias/Variance
• Preconceptions included in the datasets

Implementing Deep Learning in the Enterprise is
Brutally Hard

But not just because of the obvious reasons…

10 painful, non-trivial lessons we learned while
building deep learning solutions at scale…

Lesson #1: Data Scientists Make Horrible
Engineers…

Challenge
• Data scientists are great at experimentation
• Not so much at writing high quality code
• Experimentation deep learning frameworks don’t necessarily make great
production frameworks, ex: PyTorch vs. TensorFlow

A Possible Solution: Divide Data Science and
Engineering Teams
• Write notebooks and
experimentation
models
Data Science
Team
• Refactor or rewrite
models for production
environments
• Automate training and
optimization jobs
Engineering
Team • Deploy models
• Monitor, retrain, and
optimize models
DevOps Teams

Lesson #2: Notebooks Don’t Scale …
Wait, Notebooks Do Scale Stupid

Challenge
• Notebooks are ideal for model experimentation and testing
• Notebooks typically have performance challenges when executed at
scale
• Scaling Notebook environments can be challenging
• Parametrizing Notebook executions is far from trivial

A Possible Solution: Use Containers for
Running Production Deep Learning Workloads
Model Experimentation
Jupyter, Zeppelin
Scheduling Notebooks
Papermill
Netflix’s Meson
Running Complex
Workflows
Docker Containers
Kubernetes

Lesson #3: The Single Deep Learning
Framework Fallacy…

Challenge
• Enterprises like to standardize on a single deep learning framework
• Different teams have different technology preferences
• Providing a consistent deep learning platform across different deep
learning frameworks is no easy task

A Possible Solution: Provide a Consistent
Infrastructure Across Different Deep Learning
Runtimes
Infrastructure
Data Cleansing Feature Extraction Model Training ….
Runtime
Hyperparameter
Optimization
Retraining Model Monitoring …
Model Development
TensorFlow PyTorch Caffee2 …

Lesson #4: Training is a Continuous Task…

Challenge
• The No Free Lunch Theorem
• Trained models can perform poorly against new datasets
• New engineers and DevOps need to understand how to re-train existing
models

A Possible Solution: Automate Training Jobs
DataLake
Data Outcomes/Feature
Store
Training Job1
Training Job2
Training JobN

Lesson #5: Centralized Training Doesn’t
Scale…

Challenge
• Model training can be really resource intensive
• Training jobs take a long time to execute
• Data scientists love to embed the training logic as part of the model
Notebook

A Possible Solution: Follow a Distributed Training
Architecture and Automate Training Jobs
Trained
Models
Training
Jobs
Training
Server
Training
Job
Task1 Model1
Task2 Model2
TaskN ModelN

Lesson #6: Feature Extraction Can Become a
Reusability Nightmare…

Challenge
• Different models require the same features from a dataset
• Feature extraction jobs are computationally expensive
• Different teams create proprietary ways to capture and store feature
information

A Potential Solution: Build a Centralized
Feature Store
Dataset Preparation
Job1
Dataset Preparation
Job2
Dataset Preparation
JobN
Representation
Learning Task1
Representation
Learning Task1
Representation
Learning Task1
Feature
Store
Model 1
Model N

Lesson #7: Everyone Wants a Different
Version of the Same Model…

Challenge
• Different teams might want variations of an existing model
• The same model might be trained on different sections of the original
training set
• You might end up with thousands of versions of the original model
• Even the simplest models take a long time to implement

A Possible Solution: Using AutoML and
Hierarchical Partitions on the Training Dataset
Training
Dataset
Dataset
Section 1
AutoML
Model
Version 3
Dataset
Section 2
AutoML
Model
Version 3
Dataset
Section 3
AutoML
Model
Version 3
Model

Lesson #8: Cloud Heavens, On-Premise Hell…

Challenge
• Cloud deep learning platforms are far more sophisticated that their on-
premise equivalent
• Running deep learning workloads on-premise requires a complex
infrastructure

A Possible Solution: Consider Spark or Flink as
the On-Premise Runtime
Production
Experimentation/Development
Deploy

Lesson #9: Regularization, Optimizations are a
Must…

Challenge
• Deep learning models tend to vary their performance when using
different datasets
• The cost functions of different deep learning models changes when using
different datasets

A Possible Solution: Make Regularization and
Optimization Key Elements of the Lifecycle of a
Model
Model
Development
RegularizationOptimization

Lesson #10: Different Models Require
Different Execution Patterns…

Challenge
• Not all models can be executed via APIs
• Some models take a long time to run
• In some scenarios, different models need to be executed at the same
time based on a specific condition

Possible Solution: Enable On-Demand, Scheduled
and Pub-Sub Execution of Deep Learning Models
Scheduled Activation
Model Model Model
Pub-Sub Activation
Model Model Model
On-Demand Activation
Model Model Model
Model API
Gateway
Event
Gateway

Summary
• Deep learning theory doesn't quite work in real world scenarios
• Deep learning requires a new type of architecture
• Consider combining some of the patterns described in this
presentation into a single cohesive architecture for the
implementation of deep learning solutions

Thanks
jr@invectoriq.com
https://medium.com/@jrodthoughts
https://twitter.com/jrdothoughts

10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions

Recommended

Recommended

More Related Content

Similar to 10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions

Similar to 10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions (20)

More from Jesus Rodriguez

More from Jesus Rodriguez (20)

Recently uploaded

Recently uploaded (20)

10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions