Join Shreya Rajpal, CEO of Guardrails AI, and Travis Addair, CTO of Predibase, in this exclusive webinar to learn all about leveraging the part of AI that constitutes your IP – your data – to build a defensible AI strategy for the future!
9. Why is this the case?
Root Cause
Machine learning is fundamentally non-deterministic.
Symptom
“My LLM application worked while prototyping, but failed the moment
I handed it off to someone else.”
12. Getting ‘correct’ outputs always is hard
Some common issues:
● Hallucinations
● Falsehoods
● Lack of correct structure
● Prompt injection
Only tool available to devs is the prompt
13. 13
Use of LLMs is limited
when “correctness” is critical.
How do we add correctness guarantees to LLMs?
18. What Guardrails AI does
Guardrails AI is an open source AI-verification framework that supports
✅ Framework for creating custom validators
✅ Orchestration of prompting → verification → re-prompting
✅ Library of commonly used validators for multiple use cases
✅ Specification language for communicating requirements to LLM
20. 21
Case Study: Internal chatbot with “correct” responses
Problem
Build a chatbot over the help center
articles of your mobile application
Correctness Criteria
Don’t hallucinate
Don’t use foul language
Don’t mention my competitors
21. How do I prevent LLM hallucinations?
Provenance Guardrails
Every LLM utterance should have a source
of truth.
https://docs.guardrailsai.com/api_reference/validators/#guardrails.validators.ProvenanceV1
27. How do I change my password?
Example: Validating “correctness”
28. How do I change my password?
Example: Validating “correctness”
29. How do I change my password?
Example: Validating “correctness”
30. How do I change my password?
Example: Validating “correctness”
31. How do I change my password?
Example: Validating “correctness”
32. More examples of validations
● Make sure my code is executable
● Never give financial or healthcare advice
● Don’t ask private questions
● Don’t mention competitors
● Ensure each sentence is from a verified source and is accurate
● No profanity is mentioned in text
● Prompt injection protection
● Never expose prompt or sources
33. In Summary,
Guardrails AI is an open source AI-verification framework that supports
✅ Framework for creating custom validators
✅ Orchestration of prompting → verification → re-prompting
✅ Library of commonly used validators for multiple use cases
✅ Specification language for communicating requirements to LLM
34. Learn more
● Github: github.com/guardrails-ai/guardrails
● Website: guardrailsai.com
● Twitter: @ShreyaR or @guardrails_ai
35.
36. The future is fine-tuned
Smaller, task-specific models will be the real AI wave. How do we pull up the future?
37. The LLM Deep Learning Revolution is here
Rapid advancements in general intelligence
70B
parameters
1.4-1.8T
parameters
2000
XGBoost
Catboost
BERT
LLaMa-2
GPT4
2025+
175B
parameters
GPT3
300M
parameters
100
trees
1000
trees
AlexNet
60M
parameters
Predictive Analytics Classification Generative and Conversational AI
11B
parameters
T5
XT
parameters
The next big
OSS LLM
Pre-Deep Learning The Deep Learning and LLM Era
39. Graduating from OpenAI to open-source
Commercial LLMs are a good starting point
Great for rapid experimentation but…
✘ Lack model ownership
✘ Need to give up access to data
✘ Too slow, expensive & overkill for most tasks
GPT-4
4
40. Graduating from OpenAI to open-source
But the future is fine-tuned and open-source
BERT
Mistral-7B
Determine Customer
Sentiment
Prioritize Customer
Support Tickets
Customer Service
Chatbot
Llama2
-70B
Benefits of smaller
task-specific LLMs
✓ Own your models
✓ Don’t share data
✓ Smaller and faster
✓ Control the output
GPT-4
5
Great for rapid experimentation but…
✘ Lack model ownership
✘ Need to give up access to data
✘ Too slow, expensive & overkill for most tasks
41. Better performance and 250x smaller
Fine-tuned models outperform their larger more expensive commercial alternatives
43. CASE STUDY
Distilling OSS models for content moderation
Pilot with Fine-Tuning
● Tested end to end on 4 large
scale datasets (billions of rows)
● Tabular classification and
regression tasks, highly
imbalanced
● Compared against OS AutoML
and Google AutoML Tables
Outcomes
● Improved performance by +5/+20%
● Performance gap increased with
data size
● Very fast training and prediction
(25m-55m)
44. 9
Flaky distributed training,
frequent OOMs, and GPU
shortages
Serving each model requires
expensive GPUs, need to
autoscale and be
production-grade
Keeping up
Costly Model Serving
Complex Training
Best practices in research are
introduced by the week. How
does your team stay updated?
Productionizing LLMs is harder than it seems
The main challenges engineering teams face in productionizing LLMs
46. Simplify Training with
Easy to start Expert level control Generative AI
input_features:
name: sentence
type: text
output_features:
name: intent
type: category
input_features:
name: sentence
type: text
encoder: bert
output_features:
name: intent
type: category
trainer:
regularize: 0.1
dropout: 0.05
model_type: llm
name: llama-v2-13b
input_features:
name: question
output_features:
name: answer
trainer:
type: finetune
peft: qlora
From months to days
No ML code required
Readable & Reproducible
Easy to Iterate
Extensible
Latest OSS models
Efficient fine-tuning
Retrieval augmentation
An open-source declarative ML framework started at Uber
48. 13
Parameter Efficient Fine-Tuning
Pretrained LLM
Dataset
Pretrained LLM
Pretrained LLM
Dataset
Additional
Trainable
Parameters
Finetuned LLM
All
parameters
are updated,
which is very
expensive
Traditional Fine-Tuning Parameter Efficient Fine-Tuning
Only a small set
of new, task
specific-
parameters are
updated
Fine-tuned
LLM includes
these new
additional
parameters
49. Low-Rank Adaptation (LoRA)
Compress “fine-tunable” parameters to 0.5% - 10 % of total parameters in your LLM
https://arxiv.org/abs/2106.09685
Let W have a shape
of 1024 x 1024
LoRA matrices A and B have shapes
1024 x 8 and 8 x 1024 respectively.
Multiplying A and B gives the same
shape as W but only using 1024 * 8 * 2
parameters (1.5% of the weights in W)
adapter:
type: lora
r: 32
adapter:
type: lora
quantization:
bits: 4
50.
51. But how much will this cost in the cloud?
GPU Tier $ / hr (AWS) VRAM (GiB)
H100 Enterprise 12.29 80
A100 Enterprise 5.12 80
V100 Enterprise 3.90 32
A10G Enterprise 1.21 24
T4 Enterprise 0.98 16
RTX 4080 Consumer N/A 16