Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ray Serve: A new scalable machine learning model serving library on Ray

350 views

Published on

When a machine learning model needs to served for interactive use cases, the models are either wrapped inside a Flask server or deployed using external services like Sagemaker. Both methods come with flaws. In this talk, you will learn about how ray serve uses ray to address the limitations of current approaches and enable scalable model serving.

Published in: Technology
  • Be the first to comment

Ray Serve: A new scalable machine learning model serving library on Ray

  1. 1. © 2019-2020, Anyscale.io Ray.Serve: A new scalable machine learning model serving library on Ray Simon Mo xmo@anyscale.io @simon_mo_
  2. 2. @simon_mo_ A system for building scalable Python (and Java) applications. Reinforcement learning Hyperparameter tuning and distributed training Serving Distributed Applications Data analytics Ray Ecosystem 2
  3. 3. @simon_mo_ rning Hyperparameter tuning and distributed training Serving Distri Applic Data analytics This talk 3
  4. 4. Offline Training Data Data Collection Cleaning & Visualization Feature Eng. & Model Design Training & Validation Model Development Trained Models Training Pipelines Live Data Training Validation End User Application Query Prediction Prediction Service Inference Feedback Logic Big Picture: Machine Learning Lifecycle 4
  5. 5. End User Application Query Prediction Prediction Service Inference Feedback Logic Goal: serve predictions for large-scale, interactive applications 5
  6. 6. @simon_mo_ Two common approaches ● Embed model evaluation in the web server ● Offload prediction to an external service 6
  7. 7. @simon_mo_ Embed model evaluation in server 7 HTTP /api/healthz /api/db_query /api/image/id/..
  8. 8. @simon_mo_ Embed model evaluation in server 8 /api/healthz /api/db_query /api/image/id/.. /api/image/predict
  9. 9. @simon_mo_ The web server approach + Simplicity + End to end control over how model is served x One query at a time x Model loaded once, as global variable x No isolation x No fine-grained replication 9
  10. 10. @simon_mo_ The web server approach (continue) x Process-pool based deployment 10 Initial process Worker process Worker process Worker process … Requests Load Balanced Forked
  11. 11. @simon_mo_ The web server approach (continue) x Process-pool based deployment 11 Worker process Worker process
  12. 12. @simon_mo_ The web server approach (continue) x Process-pool based deployment -> memory issue 12 Initial process Worker process Worker process Worker process … Requests Load Balanced Forked
  13. 13. @simon_mo_ The web server approach (continue) x No complex pipeline 13 Model
  14. 14. @simon_mo_ The web server approach (continue) x No complex pipeline 14 Pipeline
  15. 15. @simon_mo_ The web server approach (continue) x No complex pipeline 15 A/B Test 80% 20%
  16. 16. @simon_mo_ The web server approach (continue) x No complex pipeline 16 Ensemble
  17. 17. @simon_mo_ The web server approach (continue) x No complex pipeline 17 Cascade High confidence Low confidence
  18. 18. @simon_mo_ Two common approaches ● Embed model evaluation in the web server ● Offload prediction to an external service 18
  19. 19. @simon_mo_ Offload inference to external service 19 /api/healthz /api/db_query /api/image/id/.. /api/image/predict
  20. 20. @simon_mo_ Offload inference to external service 20 -> HTTP API Validation Business Logic Input Transformation Inference Output Transformation Business Logic <- HTTP
  21. 21. @simon_mo_ Offload inference to external service 21 Web Server External Service -> HTTP API Validation Business Logic Input Transformation Inference Output Transformation Business Logic <- HTTP Input Transformation Inference Output Transformation
  22. 22. @simon_mo_ Offload inference to external service 22 Web Server External ServiceInference -> HTTP API Validation Business Logic Business Logic <- HTTP Input Transformation Output Transformation
  23. 23. @simon_mo_ External services are mostly “tensor-in, tensor-out” 23 -> HTTP API Validation Business Logic Input Transformation Inference Output Transformation Business Logic <- HTTPComplexity
  24. 24. @simon_mo_ External services approach + Separation of concern x Need to scale separately x Model evaluation logic split from transformation logic x Hard to learn x Hard to debug 24
  25. 25. @simon_mo_ Ray.Serve 25 + Simplicity + End to end control + Enable complex pipelines + Programmability and Observability
  26. 26. @simon_mo_ Serve API 26
  27. 27. @simon_mo_ Programmable Serving System ● YAML -> Python ● serve.create_backend ● serve.create_endpoint ● serve.split ● serve.scale 27
  28. 28. @simon_mo_ Kubernetes? Service Mesh? ● Serve provide a layer on top Kubernetes ○ Easy to serve simple model ○ Easy to serve complex pipeline ○ API definition and the model at the same place ○ Built-in service mesh for flexible routing 28
  29. 29. @simon_mo_ Serve runs on top of K8s 29 Ray Serve: Run on top of Kubernetes 23 Ray Serve Model Model Model Model Model Model Pod Pod Pod
  30. 30. @simon_mo_ Comparison 30 Compare to Serve TFServing + Scale to any number of nodes + Support arbitrary frameworks Seldon + Imperative pipelines + Flexible queuing policy Sagemaker + Better batching + Deploy anywhere "Flask” + Fine-grained replication + Isolated deployment
  31. 31. @simon_mo_ Try it out today pip install ray[serve] from ray.experimental import serve 31 - Ready for early adopters - #serve channel in slack - Coming soon: - Performance benchmark - Deployment tutorial
  32. 32. Questions? © 2019-2020, Anyscale.io xmo@anyscale.io @simon_mo_

×