Building and Deploying Scalable NLP Model Services

Building and Deploying
Scalable NLP Model Services
September 22, 2022

● Overview of Kubernetes (a pedestrian perspective…)
● Overview of Seldon Core
● Building & Deploying a Seldon Core Model
● More Complex Inference Graphs
Agenda

● What we’re shooting for:
○ High level understanding of k8s and seldon
○ Ability to deploy single model using this tooling
○ Template/stub to get started with deployments
● Out of scope:
○ k8s expertise
○ Full Seldon.ai ecosystem (we’ll focus narrowly)
Setting Expectations

What we’re going to build:
redaction
REST

What we’re going to build:
redaction
REST punctuation

What we’re going to build (time permitting):
punctuation
combine
redaction
sentiment
REST

If you’d like to follow along, please make sure to:
Before We Jump In
git clone https://github.com/zak-s-brown/seldon_sl2022.git
cd seldon_sl2022
make init

Overview of Kubernetes
(A pedestrian perspective…)

“Kubernetes is a portable,
extensible, open source platform
for managing containerized
workloads and services, that
facilitates both declarative
conﬁguration and automation.”
What is Kubernetes?

Allows us to deploy and manage
resilient, scalable services
● Automated rollout/rollback
● Self-healing
● Automatic bin packing
● Storage orchestration
● Secrets/conﬁg management
What is Kubernetes?

What is Kubernetes?
Aﬀectionately referred to as k8s
(k-eights or kates) by most folks,
as “kubernetes” is a bit
cumbersome

Cluster: platform for managing
containerized workloads and
services
Anatomy of a k8s Cluster

Node: Worker machine in k8s

Pod: Set of running containers in
your cluster

Container: Lightweight*
and
portable executable image that
contains all software and
dependencies

Container: Lightweight*
and
portable executable image that
contains all software and
dependencies
This is usually (part of) the
deliverable artifact for an MLE

Clusters typically contain a
mix of services, with varying
resource requirements
= pod with lower resource reqs
= pod with higher resource reqs

Pods can also specify a node
group to be deployed on,
allowing hardware
optimization for
heterogeneous workloads
general
purpose
compute
optimized

K8s supports (naive) ﬁxed
replication as well as
horizontal pod auto-scaling
(hpa) to leverage pod group
metrics to trigger scaling
events

Kubernetes Tooling
kubectl is the primary command line tool for interacting with a
kubernetes cluster
kubectl get/describe nodes/pods
kubectl apply/delete -f my-deployment.yml

k9s is an alternative tool oﬀering much functionality as kubectl in a
terminal-based UI
Kubernetes Tooling

What is Seldon?
“Seldon Core makes it easier and
faster to deploy your machine
learning models and experiments
at scale on Kubernetes. Seldon
Core serves models built in any
open-source or commercial
model building framework”

● Prepackaged model servers for common frameworks:
○ Sklearn
○ XGBoost
○ Tensorﬂow
○ ML Flow
● Language Wrappers
○ Python
○ Java (incubating)
○ R, Node, Go (alpha)
Out of the Box Support

● Seldon deployments come with:
○ REST and GRPC endpoints
○ Swagger documentation*
○ Integration with k8s metrics and monitoring (grafana)
What Comes “Out of the Box”

● Seldon deployments come with:
○ REST and GRPC endpoints
○ Swagger documentation*
○ Integration with k8s metrics and monitoring (grafana)
What Comes “Out of the Box”
A consistent framework for deploying models
across diverse organizations

Inference Graph Components
The Seldon Python SDK supports a variety of inference graph
components, accommodating a wide array of use cases

The Seldon Python SDK supports a variety of inference graph
components, accommodating a wide array of use cases
● Models: Model deployment (minimally) with predict method
● Transformers: Custom input/output transformations
● Routers: Logically direct requests to child components
● Combiners: Combine responses from multiple models
Inference Graph Components

Building & Deploying a Seldon
Core Model

Using the Python Wrapper
Define Class Containerize Deploy

To create a new model, we need to
define a model class in a file with
the same name as the defined class
(e.g. MySeldonModel.py)
Defining a Custom Model Class

The (minimal) deﬁnition of a model
with the Python SDK requires:
● __init__
● predict
● (optional) load
Defining a Custom Model Class

SpacyScrubber Model Def Hands On

There are two primary options for
containerizing models created with
the Seldon Python SDK
● Openshift source-to-image
● Docker
Containerizing a Seldon Model Class

To create a container image, your
Dockerfile should contain:
● Reference to model class file in
the root of the docker build
context (class/file name only)
● Add Seldon specific env vars
● Invoke
seldon-core-microservice

Once we’ve built our model
container, we then need to make it
available to the k8s cluster via a
container registry
Container
Registry

Once we’ve built our model
container, we then need to make it
available to the k8s cluster via a
docker registry
docker build -t mymodel:latest .
docker tag mymodel:latest localhost:5001/mymodel:latest
docker push localhost:5001/$svc:latest

SpacyScrubber Dockerfile Hands On

Deploying a Containerized Model
A k8s deployment defines the full
configuration for the model pod(s)
● Seldon version/type info
● Container definition
● Graph definition
● Replication/scaling config

# push deployment to cluster
kubectl apply -f mymodel-deploy.yml
# remove (destroy) deployment
kubectl delete -f mymodel-deploy.yml
Once we deﬁne the deployment,
we can push it to our k8s cluster
using kubectl:
Deploying a Containerized Model

SpacyScrubber Deployment Hands On

Testing Your Model
Seldon REST endpoints by default expect a numpy.ndarrray as
input, with a request payload of the form:
{
"data": {
"ndarray":
[<input>]
}
}

Testing Your Model
When coupled with istio for ingress, Seldon automatically wires up
new models according to the following routing pattern:
http://<ingress>/seldon/<ns>/<service>/api/v1.0/predictions

Now that we’ve run through the basics of a single model, let’s take a
look at a slightly more complex inference graph
redaction
REST punctuation

Serial Model Deployment
Assuming we already have the
component services deployed, we
can deﬁne a new deployment
similar to the following:

Serial Inference Graph Hands On

Going one step further, we can create even more complex inference
graphs:
punctuation
combine
redaction
sentiment
REST

Serial Model Deployment
Again, assuming we already have
the component services deployed,
we can deﬁne a new deployment
similar to the following:

Complex Inference Graph Hands On

Building and Deploying Scalable NLP Model Services

Recommended

Recommended

More Related Content

Similar to Building and Deploying Scalable NLP Model Services

Similar to Building and Deploying Scalable NLP Model Services (20)

More from Zachary S. Brown

More from Zachary S. Brown (7)

Recently uploaded

Recently uploaded (20)

Building and Deploying Scalable NLP Model Services