1. Triton as NLP Model Inference Back-end
Ko Ko, Microsoft AI MVP
2022/07/30 COSCUP @NTUST
2. About Ko Ko
• Just call me Ko Ko.
• Microsoft AI MVP.
• Lecturer in large conference, such as COSCUP, .NET Conf,
ModernWeb and so on.
• https://www.linkedin.com/in/ko-ko-b12a3474/
3. Contents
Overview of Triton
Structure of Triton
Other Features in Triton
Types of Model and Model Repo Structure
Config for Served model
Config for Ensemble Model
Start Triton Inference Server
Practical Example of NLP model deployment
4. What are the problems of AI inference?
AI inference service in the server is getting heavier and heavier.
Concurrency is still a big issue in AI back-end.
More and more models are integrated.
Many AI models still use in Jupyter notebook.
6. Overview of Triton
1. Born for deployment of AI models.
2. BSD license.
3. Speed up AI model inference.
4. Support multiple AI model frameworks.
5. Support gRPC and HTTP.
6. Support CPU, GPU, Multiple GPUs .
7. Model management in load/unload/update.
10. Other Features in Triton
● Model analyzer
○ Performance analysis
○ Memory analysis
● NGC
○ Just like Docker hub but for Nvidia solution
11. Types of model
● Stateless
○ CV related models
● Stateful
○ Predict results base on previous result
○ Some NLP models
● Ensemble
○ Pipeline of models
13. Model Repo Structure
Must follow the structure of files and folders.
config.pbtxt is not must for TensorRT, Tensorflow saved-model and ONNX.
$ tritonserver --model-repository=<model-repository-path>
If your model repo is on the cloud such as Azure:
$ tritonserver --model-repository=as://account_name/container_name/path/to/model/repository
24. Triton on Azure Machine Learning
https://docs.microsoft.com/zh-tw/azure/machine-learning/how-to-deploy-with-triton?tabs=endpoint
YAML file
name: densenet-onnx-model
version: 1
path: ./models
type: triton_model
description: Registering my Triton
format model.
25. Recap
Overview of Triton
Structure of Triton
Other Features in Triton
Types of Model and Model Repo Structure
Config for Served model
Config for Ensemble Model
Start Triton Inference Server
Practical Example of NLP model deployment