Training an ML model can prove quite troublesome. After all the tears, suffering and trading souls with the devil, we should ask ourselves: "where to next"? The answer is straight through hell towards glory. The real power of ML is to use a trained model in production and solve real-world problems. But, that can prove to be quite a painful journey in itself because the ML models are just a tiny piece of the whole AI system. In this talk, we are going to show you some possible ways through the hell of deployment.
5. “Expectations were like fine pottery. The harder you held them, the more
likely they were to crack.”, The Way of the Kings, Brandon Sanderson
1. ML Deployment
2. Deploy on the edge
3. Data base deployment
4. REST API - Flask
5. REST API - Flask + uWSGI
6. TensorFlow Serving
7. Message Queues
8. Tidal Waves
9. Summary
9. “I know the pieces fit 'cause I watched them tumble down”, Schism, Tool
Light
heaven
abyss
Shadow
OrderLife
DeathChaos
Reality
astral
AI system astral rift
& ML
VOID
22. Batch Inference
Predictions: On demand/scheduled
Latency: “less important”
Constraint: not real time
App
Preproc Inference
Raw
Data
Scored
Data
Scheduler/Cli
25. Flask
• Web framework - NOT A SERVER
• Easy for development
• Single request per time
• Can’t scale
• Not for production
Client
Flask GPUML
Models
Other
Services DB
30. uWSGI - Postfork fix
Master
Loads
App
Worker
Copy from
Parent
Worker
Copy from
Parent
Worker
Copy from
Parent
postfork()
postfork()
postfork()
31. uWSGI - Lazy and Postforked Summary
• TF requires postfork (or lazy apps)
• Each process makes copy of ML model
• Each process maintains own session
• High memory footprint
• GPU doesn’t always help
32. uWSGI - Lazy and Postforked Summary
• TF requires postfork (or lazy apps)
• Each process makes copy of ML model
• Each process maintains own session
• High memory footprint
• GPU doesn’t always help
52. “The purpose of a storyteller is not to tell you how to think, but to give you
questions to think upon.”, The Way of the Kings, Brandon Sanderson
• Business?
• Does accuracy matter?
• Real time?
• Retraining?
• Data size?
• Algorithm to use?
• Demo vs Production?