LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost

Trust, but Verify
Making GenAI applications production-ready
Shreya Rajpal
Guardrails AI

About me
Current CEO & Cofounder @ Guardrails AI
Past ML Infra lead @ MLops Co,
ML @ Self driving cars,
Classical AI & Deep learning research

We’re seeing a cambrian explosion of applications in AI

Interest in Artificial Intelligence over time

The reality
Source: https://www.sequoiacap.com/article/generative-ai-act-two/

Why is this the case?
Root Cause
Machine learning is fundamentally non-deterministic.
Symptom
“My LLM application worked while prototyping, but failed the moment
I handed it off to someone else.”

Software APIs are deterministic…

Getting ‘correct’ outputs always is hard
Some common issues:
● Hallucinations
● Falsehoods
● Lack of correct structure
● Prompt injection
Only tool available to devs is the prompt

13
Use of LLMs is limited
when “correctness” is critical.
How do we add correctness guarantees to LLMs?

https://github.com/guardrails-ai/guardrails

Guardrails AI acts as a safety firewall around your LLMs

What Guardrails AI does
Guardrails AI is an open source AI-verification framework that supports
✅ Framework for creating custom validators
✅ Orchestration of prompting → verification → re-prompting
✅ Library of commonly used validators for multiple use cases
✅ Specification language for communicating requirements to LLM

Implementing Guardrails
Guardrails
Grounding via
external systems
Rules-based
heuristics
Traditional
ML methods
High precision
DL classifiers
LLM self
reflection

21
Case Study: Internal chatbot with “correct” responses
Problem
Build a chatbot over the help center
articles of your mobile application
Correctness Criteria
Don’t hallucinate
Don’t use foul language
Don’t mention my competitors

How do I prevent LLM hallucinations?
Provenance Guardrails
Every LLM utterance should have a source
of truth.
https://docs.guardrailsai.com/api_reference/validators/#guardrails.validators.ProvenanceV1

Provenance Guardrails under-the-hood
Provenance
Guardrails
Embedding
similarity
Classifier built
on NLI Model
LLM self
reflection

Example: Validating “correctness”

How do I change my password?

More examples of validations
● Make sure my code is executable
● Never give financial or healthcare advice
● Don’t ask private questions
● Don’t mention competitors
● Ensure each sentence is from a verified source and is accurate
● No profanity is mentioned in text
● Prompt injection protection
● Never expose prompt or sources

In Summary,
Guardrails AI is an open source AI-verification framework that supports
✅ Framework for creating custom validators
✅ Orchestration of prompting → verification → re-prompting
✅ Library of commonly used validators for multiple use cases
✅ Specification language for communicating requirements to LLM

Learn more
● Github: github.com/guardrails-ai/guardrails
● Website: guardrailsai.com
● Twitter: @ShreyaR or @guardrails_ai

The future is fine-tuned
Smaller, task-specific models will be the real AI wave. How do we pull up the future?

The LLM Deep Learning Revolution is here
Rapid advancements in general intelligence
70B
parameters
1.4-1.8T
parameters
2000
XGBoost
Catboost
BERT
LLaMa-2
GPT4
2025+
175B
parameters
GPT3
300M
parameters
100
trees
1000
trees
AlexNet
60M
parameters
Predictive Analytics Classification Generative and Conversational AI
11B
parameters
T5
XT
parameters
The next big
OSS LLM
Pre-Deep Learning The Deep Learning and LLM Era

3
Bigger isn’t
always better
General intelligence
is great. But I don’t
need my point-of-
sales system to
recite French Poetry.

Graduating from OpenAI to open-source
Commercial LLMs are a good starting point
Great for rapid experimentation but…
✘ Lack model ownership
✘ Need to give up access to data
✘ Too slow, expensive & overkill for most tasks
GPT-4
4

Graduating from OpenAI to open-source
But the future is fine-tuned and open-source
BERT
Mistral-7B
Determine Customer
Sentiment
Prioritize Customer
Support Tickets
Customer Service
Chatbot
Llama2
-70B
Benefits of smaller
task-specific LLMs
✓ Own your models
✓ Don’t share data
✓ Smaller and faster
✓ Control the output
GPT-4
5
Great for rapid experimentation but…
✘ Lack model ownership
✘ Need to give up access to data
✘ Too slow, expensive & overkill for most tasks

Better performance and 250x smaller
Fine-tuned models outperform their larger more expensive commercial alternatives

Cold Start: where do I get training data?

CASE STUDY
Distilling OSS models for content moderation
Pilot with Fine-Tuning
● Tested end to end on 4 large
scale datasets (billions of rows)
● Tabular classification and
regression tasks, highly
imbalanced
● Compared against OS AutoML
and Google AutoML Tables
Outcomes
● Improved performance by +5/+20%
● Performance gap increased with
data size
● Very fast training and prediction
(25m-55m)

9
Flaky distributed training,
frequent OOMs, and GPU
shortages
Serving each model requires
expensive GPUs, need to
autoscale and be
production-grade
Keeping up
Costly Model Serving
Complex Training
Best practices in research are
introduced by the week. How
does your team stay updated?
Productionizing LLMs is harder than it seems
The main challenges engineering teams face in productionizing LLMs

The reality of fine-tuning training
10

Simplify Training with
Easy to start Expert level control Generative AI
input_features:
name: sentence
type: text
output_features:
name: intent
type: category
input_features:
name: sentence
type: text
encoder: bert
output_features:
name: intent
type: category
trainer:
regularize: 0.1
dropout: 0.05
model_type: llm
name: llama-v2-13b
input_features:
name: question
output_features:
name: answer
trainer:
type: finetune
peft: qlora
From months to days
No ML code required
Readable & Reproducible
Easy to Iterate
Extensible
Latest OSS models
Efficient fine-tuning
Retrieval augmentation
An open-source declarative ML framework started at Uber

12
Declaratively Fine-Tune LLMs
model_type: llm
base_model: Llama-2-7b-hf
input_features:
- name: input
type: text
output_features:
- name: output
type: text
trainer:
type: finetune
learning_rate: 0.0003
batch_size: 1
gradient_accumulation_steps: 8
epochs: 3
llm = LudwigModel(config)
results = llm.train(df)

13
Parameter Efficient Fine-Tuning
Pretrained LLM
Dataset
Pretrained LLM
Pretrained LLM
Dataset
Additional
Trainable
Parameters
Finetuned LLM
All
parameters
are updated,
which is very
expensive
Traditional Fine-Tuning Parameter Efficient Fine-Tuning
Only a small set
of new, task
specific-
parameters are
updated
Fine-tuned
LLM includes
these new
additional
parameters

Low-Rank Adaptation (LoRA)
Compress “fine-tunable” parameters to 0.5% - 10 % of total parameters in your LLM
https://arxiv.org/abs/2106.09685
Let W have a shape
of 1024 x 1024
LoRA matrices A and B have shapes
1024 x 8 and 8 x 1024 respectively.
Multiplying A and B gives the same
shape as W but only using 1024 * 8 * 2
parameters (1.5% of the weights in W)
adapter:
type: lora
r: 32
adapter:
type: lora
quantization:
bits: 4

But how much will this cost in the cloud?
GPU Tier $ / hr (AWS) VRAM (GiB)
H100 Enterprise 12.29 80
A100 Enterprise 5.12 80
V100 Enterprise 3.90 32
A10G Enterprise 1.21 24
T4 Enterprise 0.98 16
RTX 4080 Consumer N/A 16

Cost per Month on A10Gs in AWS

Deploying Multiple Fine-Tuned LLMs
llm1 = ft_model1.deploy()

Use Case 1: Chatbot Model
23
Customer: Security Company (Chatbot)
Assumptions
Request Volume (Inference)
● 800k requests / day
● 100 tokens / request (Input)
● 300 tokens / request (Output)
Dataset Size (Fine-Tuning)
● 10k rows (1M tokens) of Conversations
Annual Total Cost of Ownership (TCO)
Predibase: $87,563 ($64 / day)
OpenAI: $2,880,000 ($8K inference / day)

Use Case 2: Email Generation
24
Customer: Tech Company (Automated Email Generation)
Assumptions
Request Volume (Inference)
● 100k requests / day
● 200 tokens / request (Input)
● 500 tokens / request (Output)
Dataset Size (Fine-Tuning)
● 5k rows (2.5M tokens) of emails
Annual Total Cost of Ownership (TCO)
Predibase: $87,563 ($64 / day)
OpenAI: $612,000 ($1700 inference / day)

7,000+
downloads/month
10,300+
★ on GitHub
145+
contributors
~80
commits/month
Learn more: www.ludwig.ai
1000+
downloads/month
621
★ on GitHub
10
contributors
~100
commits/month
https://github.com/predibase/lorax
LoRAX

LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost

LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost

Similar to LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost (20)

More from Aggregage

More from Aggregage (20)

Recently uploaded

Recently uploaded (20)

LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost