Model Inferencing use cases are becoming a requirement for models moving into the next phase of production deployments. More and more users are now encountering use cases around canary deployments, scale-to-zero or serverless characteristics. And then there are also advanced use cases coming around model explainability, including A/B tests, ensemble models, multi-armed bandits, etc.
In this talk, the speakers are going to detail how to handle these use cases using Kubeflow Serving and the native Kubernetes stack which is Istio and Knative. Knative and Istio help with autoscaling, scale-to-zero, canary deployments to be implemented, and scenarios where traffic is optimized to the best performing models. This can be combined with KNative eventing, Istio observability stack, KFServing Transformer to handle pre/post-processing and payload logging which consequentially can enable drift and outlier detection to be deployed. We will demonstrate where currently KFServing is, and where it's heading towards.