Multi-Runtime Serving Pipelines
Stepan Pushkarev
CTO of Hydrosphere.io
Mission: Accelerate Machine Learning to Production
Opensource Products:
- Mist: Serverless proxy for Spark
- ML Lambda: ML Function as a Service
- Sonar: Data and ML Monitoring
Business Model: Subscription services and hands-on consulting
About
Deployment | Serving | Scoring | Inference
@Nvidia https://www.nvidia.com/en-us/deep-learning-ai/solutions/
From Single Model to Meta Pipelines
Item 1 Item 2
Title Authentic HERMES Bijouterie Fantaisie
Selle Clip-On Earrings Silvertone
#S1742 E
Auth HERMES Earrings Sellier Clip-on Silver
Tone Round $0 Ship 25130490900 S06B
Specs Brand: HERMES
Size(cm): W1.8 x H1.8 cm(Approx)
Color: Silver
Size(inch): W0.7 x H0.7" (Approx)
Style: Earrings
Rank: B
Brand: Hermes
Fastening: Clip-On
Style: Clip on
Country/Region of Manufacture: Unknown
Metal: Silver Plated
Main Color:Silver
Color: Silver
Description ... ...
Does this pair describe the same thing?
Product Matching
Model Artifact:
Ops perspective
- HTTP/1.1, HTTP/2, gRPC
- Kafka, Flink, Kinesis
- Protobuf, Avro
- Service Discovery
- Pipelining
- Tracing
- Monitoring
- Autoscaling
- Versioning
- A/B, Canary
- Testing
- CPU, GPU
API & Logistics
Monitoring
Shifting
experimentation
to production
Sidecar Architecture
Functions registry
responsible for the
model life cycle and
all the business logic
required to configure
models for serving
Mesh of serving
runtimes is an actual
serving cluster
Infrastructure
integration: ECS for
AWS, Kubernetes for GCE
and on premise
UX: Models and Applications
Applications provide public virtual endpoints for the
models and compositions of the models.
Why Not just one Big Neural Network?
● Not always possible
● Stages could be independent
● Ad-hoc rule based models
● Physics models (e.g. LIDAR)
● Big E2E DL Requires Black
Magic skills
Why Not just one Python script?
● Modularity. Stages could be developed by different teams
● Traceability and Monitoring
● Versioning
● Independent deployment, A/B testing and Canary
● Request Shadowing and other cool stuff
● Could require different ML runtimes (TF, Scikit, Spark
ML, etc)
● We need more microservices :)
Why Not just TF Serving?
● Other ML runtimes (DL4J, Scikit,
Spark ML). Servables are overkill.
● Need better versioning and
immutability (Docker per version)
● Don’t want to deal with state
(model loaded, offloaded, etc)
● Want to re-use microservices stack
(tracing, logging, metrics)
● Need better scalability
Demo
Thank you
- @hydrospheredata
- https://github.com/Hydrospheredata
- https://hydrosphere.io/
- spushkarev@hydrosphere.io

Multi runtime serving pipelines for machine learning

  • 1.
    Multi-Runtime Serving Pipelines StepanPushkarev CTO of Hydrosphere.io
  • 2.
    Mission: Accelerate MachineLearning to Production Opensource Products: - Mist: Serverless proxy for Spark - ML Lambda: ML Function as a Service - Sonar: Data and ML Monitoring Business Model: Subscription services and hands-on consulting About
  • 3.
    Deployment | Serving| Scoring | Inference @Nvidia https://www.nvidia.com/en-us/deep-learning-ai/solutions/
  • 4.
    From Single Modelto Meta Pipelines
  • 5.
    Item 1 Item2 Title Authentic HERMES Bijouterie Fantaisie Selle Clip-On Earrings Silvertone #S1742 E Auth HERMES Earrings Sellier Clip-on Silver Tone Round $0 Ship 25130490900 S06B Specs Brand: HERMES Size(cm): W1.8 x H1.8 cm(Approx) Color: Silver Size(inch): W0.7 x H0.7" (Approx) Style: Earrings Rank: B Brand: Hermes Fastening: Clip-On Style: Clip on Country/Region of Manufacture: Unknown Metal: Silver Plated Main Color:Silver Color: Silver Description ... ... Does this pair describe the same thing? Product Matching
  • 6.
  • 7.
    - HTTP/1.1, HTTP/2,gRPC - Kafka, Flink, Kinesis - Protobuf, Avro - Service Discovery - Pipelining - Tracing - Monitoring - Autoscaling - Versioning - A/B, Canary - Testing - CPU, GPU API & Logistics
  • 8.
  • 9.
  • 10.
    Functions registry responsible forthe model life cycle and all the business logic required to configure models for serving Mesh of serving runtimes is an actual serving cluster Infrastructure integration: ECS for AWS, Kubernetes for GCE and on premise
  • 11.
    UX: Models andApplications Applications provide public virtual endpoints for the models and compositions of the models.
  • 12.
    Why Not justone Big Neural Network? ● Not always possible ● Stages could be independent ● Ad-hoc rule based models ● Physics models (e.g. LIDAR) ● Big E2E DL Requires Black Magic skills
  • 13.
    Why Not justone Python script? ● Modularity. Stages could be developed by different teams ● Traceability and Monitoring ● Versioning ● Independent deployment, A/B testing and Canary ● Request Shadowing and other cool stuff ● Could require different ML runtimes (TF, Scikit, Spark ML, etc) ● We need more microservices :)
  • 14.
    Why Not justTF Serving? ● Other ML runtimes (DL4J, Scikit, Spark ML). Servables are overkill. ● Need better versioning and immutability (Docker per version) ● Don’t want to deal with state (model loaded, offloaded, etc) ● Want to re-use microservices stack (tracing, logging, metrics) ● Need better scalability
  • 15.
  • 16.
    Thank you - @hydrospheredata -https://github.com/Hydrospheredata - https://hydrosphere.io/ - spushkarev@hydrosphere.io