Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
10 Things I Wish I Dad Known Before Scaling Deep Learning Solutions
1. 10 Things I Wish I Had Known
Before Scaling Deep Learning
Solutions
Invector Labs
2. About Invector Labs
• Platform for top-class computer science talent
• Uses artificial intelligence to connect enterprises with top freelance
talent around the world
• Focused on deep tech
• Artificial intelligence
• Blockchain technologies
• Internet of things
• Cybersecurity
• Advanced cloud computing
• ….
• http://invectorlabs.com
3. Agenda
• Realities of scaling deep learning solutions
• 10 Lessons
• Challenge
• What we learned?
• Solution
4. Lessons from the Real World
•Using deep learning to analyze reviews form over 40 travel websites
•12 different deep learning models
•Scenarios: topic extraction, sentiment analysis, price predictions, hotel scoring….
•Techniques: Natural language processing, NLP micro understanding, clustering, time series analysis
Large Hospitality
Group
•Using deep learning to extract intelligence from trial discovery documents and legal research
•8 different deep learning models
•Scenarios: Natural language search, knowledge extraction, document relationships, research recommendations, strategy simulation
•Techniques: Convolutional neural networks, generative models, recurrent neural networks, natural language processing….
Legal Software
Platform Vendor
•Using deep learning to analyze cargo information and sensor data
•18 different deep learning models
•Scenarios: Car load predictions, part maintenance prediction, track video analysis
•Techniques: Convolutional neural networks, recurrent neural networks, transfer learning, predictive modeling, linear regressions….
International
Railway Company
•Using deep learning to simulate trading strategies
•11 different deep learning models
•Scenarios: Portfolio rebalancing, option pricing, daily stock selection, strategy selection
•Techniques: Reinforcement learning, transfer learning, predictive modeling
Quant Hedge Fund
5. Key Takeaways
• Implementing deep solutions at scale imposes new infrastructure
challenges
• Deep learning requires a new type of architecture
7. Deep Learning
• Deep learning is a subset of machine learning.
• Uses a hierarchy of multiple layers of nonlinear processing units for
feature extraction and transformation. Each successive layer uses the
output from the previous layer as input.
• Learns in supervised (e.g., classification) and/or unsupervised (e.g.,
pattern analysis) manners.
• Learns multiple levels of representations that correspond to different
levels of abstraction; the levels form a hierarchy of concepts.
9. What Makes Deep Learning so Challenging?
Curse of Dimensionality
• Models with millions of nodes
Over/Under Fitting
• Models too tailored to the datasets
Interpretability
• Understanding complex network structures
Bias/Variance
• Preconceptions included in the datasets
14. Challenge
• Data scientists are great at experimentation
• Not so much at writing high quality code
• Experimentation deep learning frameworks don’t necessarily make great
production frameworks, ex: PyTorch vs. TensorFlow
15. A Possible Solution: Divide Data Science and
Engineering Teams
• Write notebooks and
experimentation
models
Data Science
Team
• Refactor or rewrite
models for production
environments
• Automate training and
optimization jobs
Engineering
Team • Deploy models
• Monitor, retrain, and
optimize models
DevOps Teams
17. Challenge
• Notebooks are ideal for model experimentation and testing
• Notebooks typically have performance challenges when executed at
scale
• Scaling Notebook environments can be challenging
• Parametrizing Notebook executions is far from trivial
18. A Possible Solution: Use Containers for
Running Production Deep Learning Workloads
Model Experimentation
Jupyter, Zeppelin
Scheduling Notebooks
Papermill
Netflix’s Meson
Running Complex
Workflows
Docker Containers
Kubernetes
20. Challenge
• Enterprises like to standardize on a single deep learning framework
• Different teams have different technology preferences
• Providing a consistent deep learning platform across different deep
learning frameworks is no easy task
21. A Possible Solution: Provide a Consistent
Infrastructure Across Different Deep Learning
Runtimes
Infrastructure
Data Cleansing Feature Extraction Model Training ….
Runtime
Hyperparameter
Optimization
Retraining Model Monitoring …
Model Development
TensorFlow PyTorch Caffee2 …
23. Challenge
• The No Free Lunch Theorem
• Trained models can perform poorly against new datasets
• New engineers and DevOps need to understand how to re-train existing
models
24. A Possible Solution: Automate Training Jobs
DataLake
Data Outcomes/Feature
Store
Training Job1
Training Job2
Training JobN
26. Challenge
• Model training can be really resource intensive
• Training jobs take a long time to execute
• Data scientists love to embed the training logic as part of the model
Notebook
27. A Possible Solution: Follow a Distributed Training
Architecture and Automate Training Jobs
Trained
Models
Training
Jobs
Training
Server
Training
Job
Task1 Model1
Task2 Model2
TaskN ModelN
29. Challenge
• Different models require the same features from a dataset
• Feature extraction jobs are computationally expensive
• Different teams create proprietary ways to capture and store feature
information
30. A Potential Solution: Build a Centralized
Feature Store
Dataset Preparation
Job1
Dataset Preparation
Job2
Dataset Preparation
JobN
Representation
Learning Task1
Representation
Learning Task1
Representation
Learning Task1
Feature
Store
Model 1
Model N
32. Challenge
• Different teams might want variations of an existing model
• The same model might be trained on different sections of the original
training set
• You might end up with thousands of versions of the original model
• Even the simplest models take a long time to implement
33. A Possible Solution: Using AutoML and
Hierarchical Partitions on the Training Dataset
Training
Dataset
Dataset
Section 1
AutoML
Model
Version 3
Dataset
Section 2
AutoML
Model
Version 3
Dataset
Section 3
AutoML
Model
Version 3
Model
35. Challenge
• Cloud deep learning platforms are far more sophisticated that their on-
premise equivalent
• Running deep learning workloads on-premise requires a complex
infrastructure
36. A Possible Solution: Consider Spark or Flink as
the On-Premise Runtime
Production
Experimentation/Development
Deploy
38. Challenge
• Deep learning models tend to vary their performance when using
different datasets
• The cost functions of different deep learning models changes when using
different datasets
39. A Possible Solution: Make Regularization and
Optimization Key Elements of the Lifecycle of a
Model
Model
Development
RegularizationOptimization
41. Challenge
• Not all models can be executed via APIs
• Some models take a long time to run
• In some scenarios, different models need to be executed at the same
time based on a specific condition
42. Possible Solution: Enable On-Demand, Scheduled
and Pub-Sub Execution of Deep Learning Models
Scheduled Activation
Model Model Model
Pub-Sub Activation
Model Model Model
On-Demand Activation
Model Model Model
Model API
Gateway
Event
Gateway
43. Summary
• Deep learning theory doesn't quite work in real world scenarios
• Deep learning requires a new type of architecture
• Consider combining some of the patterns described in this
presentation into a single cohesive architecture for the
implementation of deep learning solutions