Chez OVHcloud, nous utilisons en interne des modèles de Machine Learning qui aident à la prise de décision, dans des domaines allant de la lutte contre la fraude à l'amélioration de la maintenance de nos infrastructures.
Tirant parti des formats Open Source standard - tels que les SavedModels de Tensorflow - ML Serving permet aux utilisateurs de déployer facilement leurs modèles tout en bénéficiant de fonctionnalités essentielles telles que l'instrumentation, l'évolutivité et la gestion des versions des modèles.
4. @OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Public Cloud ML Serving
4
Instead of taking care of the deployment in production, simply select ML models (your own
or pre-trained), size and deploy. We provide API endpoints and more !
✔ We simplify your architecture : we deploy your ML models for you in few clicks
✔ We simplify your code : everything can be automated (via API / CLI)
✔ We reduce your costs : you reduce the time-to-production from weeks to second, pay as you go
✔ We fix your mains challenges : we provide Scaling, Monitoring and Versioning
Our extra value :
11. @OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Hub
11
• Image building:
– Build image with model files from storage
– Push image to the registry
• Model Lifecycle:
– ApiStatus: describe the state of the runtime API
– VersionStatus: describe the state of the model image
Model Controller
14. @OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Hub
14
• Ingress controller:
– Count of HTTP requests by method and status code
– Sum of HTTP latencies by method and status code
• k8s:
– Number of pods
– Number of Model CRD
• Custom Model Metrics:
– Liveness
– Version
– Version status
– API status
Metrics
15. @OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Hub
15
• HPA: horizontal pod autoscaler
• RAM usage +60%
• CPU usage +60%
• Params:
– Max/Min threshold
– Scale decision: % and which resource
• To come: GPU usage
Auto Scaling
17. @OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Runtime
17
Prerequisites :
• Support ONNX & TensorFlow (TF) & PMML serialization format
• Able to chain several models of different kind
• Available through a web service API
HTTP Query Preprocessing Model
Postprocessin
g
HTTP
Response
Example :
18. @OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Runtime
18
Inputs
• ?
Evaluator
•ONNX
•TensorFlow
•PMML
Outputs
• ?
Think generic :
Let's create one Evaluator per supported serialization
format.
What are the common inputs & outputs ?
20. @OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Runtime
20
PMML
Inputs & outputs :
• Tabular data (i.e Dataset) Can be interpreted as a list of named tensors :
Example :
prop_int prop_bool prop_string
1 true "John"
6 true "Kim"
8 false "Hugo"
1 6 8
prop_int (3, 1)
true true false
prop_int (3, 1)
"John" "Kim" "Hugo"
prop_string (3, 1)
21. @OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Runtime
21
Inputs
• List of named tensors
Evaluator
•ONNX
•TensorFlow
•PMML
Outputs
•List of named tensors
Think generic :
What are the common inputs & outputs ?
22. @OVHcloud_fr #OVHcloudTechTalks @ChrisRannou
Serving Runtime
22
Web API
How to convert http query into a named tensors ?
How to convert named tensors into http response ?
Use the Content-Type header to decode/encode message body
?
?
..