Confidential do not distribute 1
Generative AI Automation
for private Enterprise LLMs
Part 1: LM-Controller
Confidential do not distribute 2
● AI Models and Applications are the new class of Kubernetes
workloads
● We start tackling this from LLMs
● Enterprise already invested in CPU-based Kubernetes clusters
Enterprise AI workloads
Confidential do not distribute 3
● AI Application Developers shouldn’t worry about the complexity of
model deployment.
● Platform Teams: LLMs become platform components
○ Security and Governance: signing and verification
○ RBAC and Tenancy
○ Standardization across organizations
○ Available for the Dev teams via self-service portals
Why Weave AI?
Confidential do not distribute 4
● Day 0 - Out-of-the-box experiences
○ weave-ai install
○ weave-ai run zephyr-7b-beta
● Day 1 - Integrate them to your DevOps / GitOps pipelines
○ weave-ai install --export
● Day 2 - Build and maintain model catalog for the Dev teams
○ flux commands
○ Fine-tuning models / RAG data pipelines
Why Weave AI?
Confidential do not distribute 5
Confidential do not distribute 6
● The first controller released as part of the Weave AI Controllers
● LM Controller is a Flux controller that helps deploy Large
Language Models on Kubernetes.
● It supports LLMs in the Flux OCI format.
● It uses Flux Source Controller as the in-cluster model cache.
What is LM Controller?
Confidential do not distribute 7
LLMs are snowflakes
Confidential do not distribute 8
Hugging Face
Compatible Models
GitHub / GitLab
CI
Your App
LLM Serving
Your Data CPU or GPU
on Cloud
or
on-Prem
fine-tuning
store
packaged
pulled
deploy
context
manage
LLM as Flux OCI
Confidential do not distribute 9
Why use LM Controller?
LLM Serving
LLMs
injects
all required information
to the deployment units
LM Controller
Confidential do not distribute 10
● A curated list of LLM catalog
○ In Flux’s OCI format
● Flux’s Source Controller as in-Cluster model Cache
○ No PVC required
● A controller that takes care of this and that LLM parameters for you
● A set of pre-built OpenAI API Compatible engines
○ No-AVX, AVX, AVX2, AVX512 and more to come
● An easy-to-use CLI
What Weave AI provides so far
Confidential do not distribute 11
It’s Demo Time

Weave AI Controllers (Weave GitOps Office Hours)

  • 1.
    Confidential do notdistribute 1 Generative AI Automation for private Enterprise LLMs Part 1: LM-Controller
  • 2.
    Confidential do notdistribute 2 ● AI Models and Applications are the new class of Kubernetes workloads ● We start tackling this from LLMs ● Enterprise already invested in CPU-based Kubernetes clusters Enterprise AI workloads
  • 3.
    Confidential do notdistribute 3 ● AI Application Developers shouldn’t worry about the complexity of model deployment. ● Platform Teams: LLMs become platform components ○ Security and Governance: signing and verification ○ RBAC and Tenancy ○ Standardization across organizations ○ Available for the Dev teams via self-service portals Why Weave AI?
  • 4.
    Confidential do notdistribute 4 ● Day 0 - Out-of-the-box experiences ○ weave-ai install ○ weave-ai run zephyr-7b-beta ● Day 1 - Integrate them to your DevOps / GitOps pipelines ○ weave-ai install --export ● Day 2 - Build and maintain model catalog for the Dev teams ○ flux commands ○ Fine-tuning models / RAG data pipelines Why Weave AI?
  • 5.
  • 6.
    Confidential do notdistribute 6 ● The first controller released as part of the Weave AI Controllers ● LM Controller is a Flux controller that helps deploy Large Language Models on Kubernetes. ● It supports LLMs in the Flux OCI format. ● It uses Flux Source Controller as the in-cluster model cache. What is LM Controller?
  • 7.
    Confidential do notdistribute 7 LLMs are snowflakes
  • 8.
    Confidential do notdistribute 8 Hugging Face Compatible Models GitHub / GitLab CI Your App LLM Serving Your Data CPU or GPU on Cloud or on-Prem fine-tuning store packaged pulled deploy context manage LLM as Flux OCI
  • 9.
    Confidential do notdistribute 9 Why use LM Controller? LLM Serving LLMs injects all required information to the deployment units LM Controller
  • 10.
    Confidential do notdistribute 10 ● A curated list of LLM catalog ○ In Flux’s OCI format ● Flux’s Source Controller as in-Cluster model Cache ○ No PVC required ● A controller that takes care of this and that LLM parameters for you ● A set of pre-built OpenAI API Compatible engines ○ No-AVX, AVX, AVX2, AVX512 and more to come ● An easy-to-use CLI What Weave AI provides so far
  • 11.
    Confidential do notdistribute 11 It’s Demo Time