Building and Deploying
Scalable NLP Model Services
September 22, 2022
● Overview of Kubernetes (a pedestrian perspective…)
● Overview of Seldon Core
● Building & Deploying a Seldon Core Model
● More Complex Inference Graphs
Agenda
● What we’re shooting for:
○ High level understanding of k8s and seldon
○ Ability to deploy single model using this tooling
○ Template/stub to get started with deployments
● Out of scope:
○ k8s expertise
○ Full Seldon.ai ecosystem (we’ll focus narrowly)
Setting Expectations
What we’re going to build:
Setting Expectations
redaction
REST
What we’re going to build:
Setting Expectations
redaction
REST punctuation
What we’re going to build (time permitting):
Setting Expectations
punctuation
combine
redaction
sentiment
REST
whoami
If you’d like to follow along, please make sure to:
Before We Jump In
git clone https://github.com/zak-s-brown/seldon_sl2022.git
cd seldon_sl2022
make init
Overview of Kubernetes
(A pedestrian perspective…)
“Kubernetes is a portable,
extensible, open source platform
for managing containerized
workloads and services, that
facilitates both declarative
configuration and automation.”
What is Kubernetes?
Allows us to deploy and manage
resilient, scalable services
● Automated rollout/rollback
● Self-healing
● Automatic bin packing
● Storage orchestration
● Secrets/config management
What is Kubernetes?
What is Kubernetes?
Affectionately referred to as k8s
(k-eights or kates) by most folks,
as “kubernetes” is a bit
cumbersome
Cluster: platform for managing
containerized workloads and
services
Anatomy of a k8s Cluster
Node: Worker machine in k8s
Anatomy of a k8s Cluster
Pod: Set of running containers in
your cluster
Anatomy of a k8s Cluster
Container: Lightweight*
and
portable executable image that
contains all software and
dependencies
Anatomy of a k8s Cluster
Container: Lightweight*
and
portable executable image that
contains all software and
dependencies
Anatomy of a k8s Cluster
This is usually (part of) the
deliverable artifact for an MLE
Clusters typically contain a
mix of services, with varying
resource requirements
Anatomy of a k8s Cluster
= pod with lower resource reqs
= pod with higher resource reqs
Pods can also specify a node
group to be deployed on,
allowing hardware
optimization for
heterogeneous workloads
Anatomy of a k8s Cluster
general
purpose
compute
optimized
Anatomy of a k8s Cluster
K8s supports (naive) fixed
replication as well as
horizontal pod auto-scaling
(hpa) to leverage pod group
metrics to trigger scaling
events
Kubernetes Tooling
kubectl is the primary command line tool for interacting with a
kubernetes cluster
kubectl get/describe nodes/pods
kubectl apply/delete -f my-deployment.yml
k9s is an alternative tool offering much functionality as kubectl in a
terminal-based UI
Kubernetes Tooling
Overview of Seldon Core
What is Seldon?
“Seldon Core makes it easier and
faster to deploy your machine
learning models and experiments
at scale on Kubernetes. Seldon
Core serves models built in any
open-source or commercial
model building framework”
● Prepackaged model servers for common frameworks:
○ Sklearn
○ XGBoost
○ Tensorflow
○ ML Flow
● Language Wrappers
○ Python
○ Java (incubating)
○ R, Node, Go (alpha)
Out of the Box Support
● Seldon deployments come with:
○ REST and GRPC endpoints
○ Swagger documentation*
○ Integration with k8s metrics and monitoring (grafana)
What Comes “Out of the Box”
● Seldon deployments come with:
○ REST and GRPC endpoints
○ Swagger documentation*
○ Integration with k8s metrics and monitoring (grafana)
What Comes “Out of the Box”
A consistent framework for deploying models
across diverse organizations
Inference Graph Components
The Seldon Python SDK supports a variety of inference graph
components, accommodating a wide array of use cases
The Seldon Python SDK supports a variety of inference graph
components, accommodating a wide array of use cases
● Models: Model deployment (minimally) with predict method
● Transformers: Custom input/output transformations
● Routers: Logically direct requests to child components
● Combiners: Combine responses from multiple models
Inference Graph Components
Building & Deploying a Seldon
Core Model
Using the Python Wrapper
Define Class Containerize Deploy
Using the Python Wrapper
Define Class Containerize Deploy
To create a new model, we need to
define a model class in a file with
the same name as the defined class
(e.g. MySeldonModel.py)
Defining a Custom Model Class
The (minimal) definition of a model
with the Python SDK requires:
● __init__
● predict
● (optional) load
Defining a Custom Model Class
The (minimal) definition of a model
with the Python SDK requires:
● __init__
● predict
● (optional) load
Defining a Custom Model Class
SpacyScrubber Model Def Hands On
Using the Python Wrapper
Define Class Containerize Deploy
There are two primary options for
containerizing models created with
the Seldon Python SDK
● Openshift source-to-image
● Docker
Containerizing a Seldon Model Class
To create a container image, your
Dockerfile should contain:
● Reference to model class file in
the root of the docker build
context (class/file name only)
● Add Seldon specific env vars
● Invoke
seldon-core-microservice
Containerizing a Seldon Model Class
Once we’ve built our model
container, we then need to make it
available to the k8s cluster via a
container registry
Containerizing a Seldon Model Class
Container
Registry
Once we’ve built our model
container, we then need to make it
available to the k8s cluster via a
docker registry
Containerizing a Seldon Model Class
docker build -t mymodel:latest .
docker tag mymodel:latest localhost:5001/mymodel:latest
docker push localhost:5001/$svc:latest
SpacyScrubber Dockerfile Hands On
Using the Python Wrapper
Define Class Containerize Deploy
Deploying a Containerized Model
A k8s deployment defines the full
configuration for the model pod(s)
● Seldon version/type info
● Container definition
● Graph definition
● Replication/scaling config
# push deployment to cluster
kubectl apply -f mymodel-deploy.yml
# remove (destroy) deployment
kubectl delete -f mymodel-deploy.yml
Once we define the deployment,
we can push it to our k8s cluster
using kubectl:
Deploying a Containerized Model
SpacyScrubber Deployment Hands On
Testing Your Model
Seldon REST endpoints by default expect a numpy.ndarrray as
input, with a request payload of the form:
{
"data": {
"ndarray":
[<input>]
}
}
Testing Your Model
When coupled with istio for ingress, Seldon automatically wires up
new models according to the following routing pattern:
http://<ingress>/seldon/<ns>/<service>/api/v1.0/predictions
SpacyScrubber Endpoint Test
More Complex Inference Graphs
Now that we’ve run through the basics of a single model, let’s take a
look at a slightly more complex inference graph
Setting Expectations
redaction
REST punctuation
Serial Model Deployment
Assuming we already have the
component services deployed, we
can define a new deployment
similar to the following:
Serial Inference Graph Hands On
Going one step further, we can create even more complex inference
graphs:
Setting Expectations
punctuation
combine
redaction
sentiment
REST
Serial Model Deployment
Again, assuming we already have
the component services deployed,
we can define a new deployment
similar to the following:
Complex Inference Graph Hands On

Building and Deploying Scalable NLP Model Services

  • 1.
    Building and Deploying ScalableNLP Model Services September 22, 2022
  • 2.
    ● Overview ofKubernetes (a pedestrian perspective…) ● Overview of Seldon Core ● Building & Deploying a Seldon Core Model ● More Complex Inference Graphs Agenda
  • 3.
    ● What we’reshooting for: ○ High level understanding of k8s and seldon ○ Ability to deploy single model using this tooling ○ Template/stub to get started with deployments ● Out of scope: ○ k8s expertise ○ Full Seldon.ai ecosystem (we’ll focus narrowly) Setting Expectations
  • 4.
    What we’re goingto build: Setting Expectations redaction REST
  • 5.
    What we’re goingto build: Setting Expectations redaction REST punctuation
  • 6.
    What we’re goingto build (time permitting): Setting Expectations punctuation combine redaction sentiment REST
  • 7.
  • 8.
    If you’d liketo follow along, please make sure to: Before We Jump In git clone https://github.com/zak-s-brown/seldon_sl2022.git cd seldon_sl2022 make init
  • 9.
    Overview of Kubernetes (Apedestrian perspective…)
  • 10.
    “Kubernetes is aportable, extensible, open source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation.” What is Kubernetes?
  • 11.
    Allows us todeploy and manage resilient, scalable services ● Automated rollout/rollback ● Self-healing ● Automatic bin packing ● Storage orchestration ● Secrets/config management What is Kubernetes?
  • 12.
    What is Kubernetes? Affectionatelyreferred to as k8s (k-eights or kates) by most folks, as “kubernetes” is a bit cumbersome
  • 13.
    Cluster: platform formanaging containerized workloads and services Anatomy of a k8s Cluster
  • 14.
    Node: Worker machinein k8s Anatomy of a k8s Cluster
  • 15.
    Pod: Set ofrunning containers in your cluster Anatomy of a k8s Cluster
  • 16.
    Container: Lightweight* and portable executableimage that contains all software and dependencies Anatomy of a k8s Cluster
  • 17.
    Container: Lightweight* and portable executableimage that contains all software and dependencies Anatomy of a k8s Cluster This is usually (part of) the deliverable artifact for an MLE
  • 18.
    Clusters typically containa mix of services, with varying resource requirements Anatomy of a k8s Cluster = pod with lower resource reqs = pod with higher resource reqs
  • 19.
    Pods can alsospecify a node group to be deployed on, allowing hardware optimization for heterogeneous workloads Anatomy of a k8s Cluster general purpose compute optimized
  • 20.
    Anatomy of ak8s Cluster K8s supports (naive) fixed replication as well as horizontal pod auto-scaling (hpa) to leverage pod group metrics to trigger scaling events
  • 21.
    Kubernetes Tooling kubectl isthe primary command line tool for interacting with a kubernetes cluster kubectl get/describe nodes/pods kubectl apply/delete -f my-deployment.yml
  • 22.
    k9s is analternative tool offering much functionality as kubectl in a terminal-based UI Kubernetes Tooling
  • 23.
  • 24.
    What is Seldon? “SeldonCore makes it easier and faster to deploy your machine learning models and experiments at scale on Kubernetes. Seldon Core serves models built in any open-source or commercial model building framework”
  • 25.
    ● Prepackaged modelservers for common frameworks: ○ Sklearn ○ XGBoost ○ Tensorflow ○ ML Flow ● Language Wrappers ○ Python ○ Java (incubating) ○ R, Node, Go (alpha) Out of the Box Support
  • 26.
    ● Seldon deploymentscome with: ○ REST and GRPC endpoints ○ Swagger documentation* ○ Integration with k8s metrics and monitoring (grafana) What Comes “Out of the Box”
  • 27.
    ● Seldon deploymentscome with: ○ REST and GRPC endpoints ○ Swagger documentation* ○ Integration with k8s metrics and monitoring (grafana) What Comes “Out of the Box” A consistent framework for deploying models across diverse organizations
  • 28.
    Inference Graph Components TheSeldon Python SDK supports a variety of inference graph components, accommodating a wide array of use cases
  • 29.
    The Seldon PythonSDK supports a variety of inference graph components, accommodating a wide array of use cases ● Models: Model deployment (minimally) with predict method ● Transformers: Custom input/output transformations ● Routers: Logically direct requests to child components ● Combiners: Combine responses from multiple models Inference Graph Components
  • 30.
    Building & Deployinga Seldon Core Model
  • 31.
    Using the PythonWrapper Define Class Containerize Deploy
  • 32.
    Using the PythonWrapper Define Class Containerize Deploy
  • 33.
    To create anew model, we need to define a model class in a file with the same name as the defined class (e.g. MySeldonModel.py) Defining a Custom Model Class
  • 34.
    The (minimal) definitionof a model with the Python SDK requires: ● __init__ ● predict ● (optional) load Defining a Custom Model Class
  • 35.
    The (minimal) definitionof a model with the Python SDK requires: ● __init__ ● predict ● (optional) load Defining a Custom Model Class
  • 36.
  • 37.
    Using the PythonWrapper Define Class Containerize Deploy
  • 38.
    There are twoprimary options for containerizing models created with the Seldon Python SDK ● Openshift source-to-image ● Docker Containerizing a Seldon Model Class
  • 39.
    To create acontainer image, your Dockerfile should contain: ● Reference to model class file in the root of the docker build context (class/file name only) ● Add Seldon specific env vars ● Invoke seldon-core-microservice Containerizing a Seldon Model Class
  • 40.
    Once we’ve builtour model container, we then need to make it available to the k8s cluster via a container registry Containerizing a Seldon Model Class Container Registry
  • 41.
    Once we’ve builtour model container, we then need to make it available to the k8s cluster via a docker registry Containerizing a Seldon Model Class docker build -t mymodel:latest . docker tag mymodel:latest localhost:5001/mymodel:latest docker push localhost:5001/$svc:latest
  • 42.
  • 43.
    Using the PythonWrapper Define Class Containerize Deploy
  • 44.
    Deploying a ContainerizedModel A k8s deployment defines the full configuration for the model pod(s) ● Seldon version/type info ● Container definition ● Graph definition ● Replication/scaling config
  • 45.
    # push deploymentto cluster kubectl apply -f mymodel-deploy.yml # remove (destroy) deployment kubectl delete -f mymodel-deploy.yml Once we define the deployment, we can push it to our k8s cluster using kubectl: Deploying a Containerized Model
  • 46.
  • 47.
    Testing Your Model SeldonREST endpoints by default expect a numpy.ndarrray as input, with a request payload of the form: { "data": { "ndarray": [<input>] } }
  • 48.
    Testing Your Model Whencoupled with istio for ingress, Seldon automatically wires up new models according to the following routing pattern: http://<ingress>/seldon/<ns>/<service>/api/v1.0/predictions
  • 49.
  • 50.
  • 51.
    Now that we’verun through the basics of a single model, let’s take a look at a slightly more complex inference graph Setting Expectations redaction REST punctuation
  • 52.
    Serial Model Deployment Assumingwe already have the component services deployed, we can define a new deployment similar to the following:
  • 53.
  • 54.
    Going one stepfurther, we can create even more complex inference graphs: Setting Expectations punctuation combine redaction sentiment REST
  • 55.
    Serial Model Deployment Again,assuming we already have the component services deployed, we can define a new deployment similar to the following:
  • 56.