Advertisement

"Problems and solutions with generative and non-generative AI models deployments in k8s", Artem Chernenko

Fwdays
Fwdays
Mar. 20, 2023
Advertisement

More Related Content

Similar to "Problems and solutions with generative and non-generative AI models deployments in k8s", Artem Chernenko (20)

More from Fwdays(20)

Advertisement

"Problems and solutions with generative and non-generative AI models deployments in k8s", Artem Chernenko

  1. PROBLEMS AND SOLUTIONS WITH GENERATIVE AND NON- GENERATIVE AI MODELS DEPLOYMENTS IN K8S ARTEM CHERNENKO
  2. ABOUT ME - Artem Chernenko - Devops Engineer at Let’s Enhance - Hip-Hop dancer
  3. PROBLEMS • It’s not obvious how models are injected into application • For local testing we need to download models and put it into specific place. Which is not comfortable • Adding and tuning existing models was not standardised • Development process is extremely slow
  4. AI MODELS DEVELOPMENT • Training • Deployment
  5. MODEL • Another dependency • Size is bigger than usual library package • May be 50 MB, 8 GB, etc.
  6. TRAINING • Input • ML engineers • Compute resources for their experiments • Output • Model (file)
  7. APPLICATION • Requirements • GPU attached • Drivers to work with GPU • Models local paths • Principle of work • Load local models into the GPU memory • Starts working
  8. WHEN TO PUT MODELS WITH AN APPLICATION • Build phase • Startup phase
  9. STARTUP PHASE PROS • Fastest image build time • No models in image • Models changed – image the same (if code wasn’t changed) • No additional traffic costs if models are stored in cloud internally
  10. STARTUP PHASE CONS • Slow startup: downloading models from external storage to a disk takes time • Need to implement models versioning • To avoid conflicts between environments, pointing to the same model, but different version • Automatization • Maintaining • Need to setup access • There are no clear mapping between application verison and models that are compatible with this version
  11. BUILD PHASE • Git LFS • Non Git LFS
  12. BUILD PHASE PROS (GIT LFS) • Versioning out of the box • All dependencies are stored in a single artifact • We don’t need to handle access policies and permissions (Github manages it under the hood)
  13. BUILD PHASE CONS (GIT LFS) • Filesize limit – 2GB. Enterprise – 5GB • Slow build time • Git LFS cloning • Docker context loading • Slow startup: pulling models takes time • Github LFS bandwith (w/ Github only)
  14. GITHUB LFS BANDWITH BILLING: PROBLEM • Every Git LFS clonning adds bandwith costs
  15. GITHUB LFS BANDWITH BILLING: SOLUTION 1 • Using Github Cache
  16. GITHUB LFS BANDWITH BILLING: SOLUTION 2 • Build only on main branches
  17. GITHUB LFS BANDWITH BILLING: SOLUTION 3 • Labels to skip build part
  18. BUILD PHASE PROS (NON GIT LFS) • All dependencies are stored in a single artifact
  19. BUILD PHASE CONS (NON GIT LFS) • Need to implement models versioning • Need to setup access • Slow builds • Slow startup: pulling models takes time
  20. BUILD PHASE CONS (NON GIT LFS) • Need to implement models versioning • Need to setup access • Slow build. Can be solved w/ caching • Slow startup: pulling models takes time. Can be solved by Image streaming
  21. IMAGE STREAMING • Pod is running instantly • Size does not matter • In worst case you will have a slightly bigger read latency time from disk • All images are located in Artifact Registry which is stored on GCS • Has additional magic caching mechanism
  22. DEMO
  23. RESULTS
  24. PROBLEMS SOLVED • Problems: • It’s not obvious how models are injected into application • For local testing we need to download models and put it into specific place. Which is not comfortable • Solution: Model is stored in docker image • Problem: Adding and tuning existing models was not standardised • Solution: There is a procedure for adding models into GCS bucket • Problem: Development process is extremely slow • Solution: Caching + image streaming
  25. THANKS
Advertisement