Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multi runtime serving pipelines for machine learning

The talk I gave at Scale By The Bay.
Deploying, Serving and monitoring machine learning models built with different ML frameworks in production. Envoy proxy powered serving mesh. TensorFlow, Spark ML, Scikit-learn and custom functions on CPU and GPU.

  • Be the first to comment

  • Be the first to like this

Multi runtime serving pipelines for machine learning

  1. 1. Multi-Runtime Serving Pipelines Stepan Pushkarev CTO of
  2. 2. Mission: Accelerate Machine Learning to Production Opensource Products: - Mist: Serverless proxy for Spark - ML Lambda: ML Function as a Service - Sonar: Data and ML Monitoring Business Model: Subscription services and hands-on consulting About
  3. 3. Deployment | Serving | Scoring | Inference @Nvidia
  4. 4. From Single Model to Meta Pipelines
  5. 5. Item 1 Item 2 Title Authentic HERMES Bijouterie Fantaisie Selle Clip-On Earrings Silvertone #S1742 E Auth HERMES Earrings Sellier Clip-on Silver Tone Round $0 Ship 25130490900 S06B Specs Brand: HERMES Size(cm): W1.8 x H1.8 cm(Approx) Color: Silver Size(inch): W0.7 x H0.7" (Approx) Style: Earrings Rank: B Brand: Hermes Fastening: Clip-On Style: Clip on Country/Region of Manufacture: Unknown Metal: Silver Plated Main Color:Silver Color: Silver Description ... ... Does this pair describe the same thing? Product Matching
  6. 6. Model Artifact: Ops perspective
  7. 7. - HTTP/1.1, HTTP/2, gRPC - Kafka, Flink, Kinesis - Protobuf, Avro - Service Discovery - Pipelining - Tracing - Monitoring - Autoscaling - Versioning - A/B, Canary - Testing - CPU, GPU API & Logistics
  8. 8. Monitoring Shifting experimentation to production
  9. 9. Sidecar Architecture
  10. 10. Functions registry responsible for the model life cycle and all the business logic required to configure models for serving Mesh of serving runtimes is an actual serving cluster Infrastructure integration: ECS for AWS, Kubernetes for GCE and on premise
  11. 11. UX: Models and Applications Applications provide public virtual endpoints for the models and compositions of the models.
  12. 12. Why Not just one Big Neural Network? ● Not always possible ● Stages could be independent ● Ad-hoc rule based models ● Physics models (e.g. LIDAR) ● Big E2E DL Requires Black Magic skills
  13. 13. Why Not just one Python script? ● Modularity. Stages could be developed by different teams ● Traceability and Monitoring ● Versioning ● Independent deployment, A/B testing and Canary ● Request Shadowing and other cool stuff ● Could require different ML runtimes (TF, Scikit, Spark ML, etc) ● We need more microservices :)
  14. 14. Why Not just TF Serving? ● Other ML runtimes (DL4J, Scikit, Spark ML). Servables are overkill. ● Need better versioning and immutability (Docker per version) ● Don’t want to deal with state (model loaded, offloaded, etc) ● Want to re-use microservices stack (tracing, logging, metrics) ● Need better scalability
  15. 15. Demo
  16. 16. Thank you - @hydrospheredata - - -