SlideShare a Scribd company logo
1 of 61
Download to read offline
Trust, but Verify
Making GenAI applications production-ready
Shreya Rajpal
Guardrails AI
About me
Current CEO & Cofounder @ Guardrails AI
Past ML Infra lead @ MLops Co,
ML @ Self driving cars,
Classical AI & Deep learning research
We’re seeing a cambrian explosion of applications in AI
Interest in Artificial Intelligence over time
The reality
Source: https://www.sequoiacap.com/article/generative-ai-act-two/
Why is this the case?
Root Cause
Machine learning is fundamentally non-deterministic.
Symptom
“My LLM application worked while prototyping, but failed the moment
I handed it off to someone else.”
Software APIs are deterministic…
…but ML Model APIs are not
Getting ‘correct’ outputs always is hard
Some common issues:
● Hallucinations
● Falsehoods
● Lack of correct structure
● Prompt injection
Only tool available to devs is the prompt
13
Use of LLMs is limited
when “correctness” is critical.
How do we add correctness guarantees to LLMs?
https://github.com/guardrails-ai/guardrails
Guardrails AI acts as a safety firewall around your LLMs
Guardrails AI under-the-hood
What Guardrails AI does
Guardrails AI is an open source AI-verification framework that supports
✅ Framework for creating custom validators
✅ Orchestration of prompting → verification → re-prompting
✅ Library of commonly used validators for multiple use cases
✅ Specification language for communicating requirements to LLM
Implementing Guardrails
Guardrails
Grounding via
external systems
Rules-based
heuristics
Traditional
ML methods
High precision
DL classifiers
LLM self
reflection
21
Case Study: Internal chatbot with “correct” responses
Problem
Build a chatbot over the help center
articles of your mobile application
Correctness Criteria
Don’t hallucinate
Don’t use foul language
Don’t mention my competitors
How do I prevent LLM hallucinations?
Provenance Guardrails
Every LLM utterance should have a source
of truth.
https://docs.guardrailsai.com/api_reference/validators/#guardrails.validators.ProvenanceV1
Provenance Guardrails under-the-hood
Provenance
Guardrails
Embedding
similarity
Classifier built
on NLI Model
LLM self
reflection
Configure Guard
Example: Validating “correctness”
Example: Validating “correctness”
How do I change my password?
Example: Validating “correctness”
How do I change my password?
How do I change my password?
Example: Validating “correctness”
How do I change my password?
Example: Validating “correctness”
How do I change my password?
Example: Validating “correctness”
How do I change my password?
Example: Validating “correctness”
How do I change my password?
Example: Validating “correctness”
More examples of validations
● Make sure my code is executable
● Never give financial or healthcare advice
● Don’t ask private questions
● Don’t mention competitors
● Ensure each sentence is from a verified source and is accurate
● No profanity is mentioned in text
● Prompt injection protection
● Never expose prompt or sources
In Summary,
Guardrails AI is an open source AI-verification framework that supports
✅ Framework for creating custom validators
✅ Orchestration of prompting → verification → re-prompting
✅ Library of commonly used validators for multiple use cases
✅ Specification language for communicating requirements to LLM
Learn more
● Github: github.com/guardrails-ai/guardrails
● Website: guardrailsai.com
● Twitter: @ShreyaR or @guardrails_ai
The future is fine-tuned
Smaller, task-specific models will be the real AI wave. How do we pull up the future?
The LLM Deep Learning Revolution is here
Rapid advancements in general intelligence
70B
parameters
1.4-1.8T
parameters
2000
XGBoost
Catboost
BERT
LLaMa-2
GPT4
2025+
175B
parameters
GPT3
300M
parameters
100
trees
1000
trees
AlexNet
60M
parameters
Predictive Analytics Classification Generative and Conversational AI
11B
parameters
T5
XT
parameters
The next big
OSS LLM
Pre-Deep Learning The Deep Learning and LLM Era
3
Bigger isn’t
always better
General intelligence
is great. But I don’t
need my point-of-
sales system to
recite French Poetry.
Graduating from OpenAI to open-source
Commercial LLMs are a good starting point
Great for rapid experimentation but…
✘ Lack model ownership
✘ Need to give up access to data
✘ Too slow, expensive & overkill for most tasks
GPT-4
4
Graduating from OpenAI to open-source
But the future is fine-tuned and open-source
BERT
Mistral-7B
Determine Customer
Sentiment
Prioritize Customer
Support Tickets
Customer Service
Chatbot
Llama2
-70B
Benefits of smaller
task-specific LLMs
✓ Own your models
✓ Don’t share data
✓ Smaller and faster
✓ Control the output
GPT-4
5
Great for rapid experimentation but…
✘ Lack model ownership
✘ Need to give up access to data
✘ Too slow, expensive & overkill for most tasks
Better performance and 250x smaller
Fine-tuned models outperform their larger more expensive commercial alternatives
Cold Start: where do I get training data?
CASE STUDY
Distilling OSS models for content moderation
Pilot with Fine-Tuning
● Tested end to end on 4 large
scale datasets (billions of rows)
● Tabular classification and
regression tasks, highly
imbalanced
● Compared against OS AutoML
and Google AutoML Tables
Outcomes
● Improved performance by +5/+20%
● Performance gap increased with
data size
● Very fast training and prediction
(25m-55m)
9
Flaky distributed training,
frequent OOMs, and GPU
shortages
Serving each model requires
expensive GPUs, need to
autoscale and be
production-grade
Keeping up
Costly Model Serving
Complex Training
Best practices in research are
introduced by the week. How
does your team stay updated?
Productionizing LLMs is harder than it seems
The main challenges engineering teams face in productionizing LLMs
The reality of fine-tuning training
10
Simplify Training with
Easy to start Expert level control Generative AI
input_features:
name: sentence
type: text
output_features:
name: intent
type: category
input_features:
name: sentence
type: text
encoder: bert
output_features:
name: intent
type: category
trainer:
regularize: 0.1
dropout: 0.05
model_type: llm
name: llama-v2-13b
input_features:
name: question
output_features:
name: answer
trainer:
type: finetune
peft: qlora
From months to days
No ML code required
Readable & Reproducible
Easy to Iterate
Extensible
Latest OSS models
Efficient fine-tuning
Retrieval augmentation
An open-source declarative ML framework started at Uber
12
Declaratively Fine-Tune LLMs
model_type: llm
base_model: Llama-2-7b-hf
input_features:
- name: input
type: text
output_features:
- name: output
type: text
trainer:
type: finetune
learning_rate: 0.0003
batch_size: 1
gradient_accumulation_steps: 8
epochs: 3
llm = LudwigModel(config)
results = llm.train(df)
13
Parameter Efficient Fine-Tuning
Pretrained LLM
Dataset
Pretrained LLM
Pretrained LLM
Dataset
Additional
Trainable
Parameters
Finetuned LLM
All
parameters
are updated,
which is very
expensive
Traditional Fine-Tuning Parameter Efficient Fine-Tuning
Only a small set
of new, task
specific-
parameters are
updated
Fine-tuned
LLM includes
these new
additional
parameters
Low-Rank Adaptation (LoRA)
Compress “fine-tunable” parameters to 0.5% - 10 % of total parameters in your LLM
https://arxiv.org/abs/2106.09685
Let W have a shape
of 1024 x 1024
LoRA matrices A and B have shapes
1024 x 8 and 8 x 1024 respectively.
Multiplying A and B gives the same
shape as W but only using 1024 * 8 * 2
parameters (1.5% of the weights in W)
adapter:
type: lora
r: 32
adapter:
type: lora
quantization:
bits: 4
But how much will this cost in the cloud?
GPU Tier $ / hr (AWS) VRAM (GiB)
H100 Enterprise 12.29 80
A100 Enterprise 5.12 80
V100 Enterprise 3.90 32
A10G Enterprise 1.21 24
T4 Enterprise 0.98 16
RTX 4080 Consumer N/A 16
Cost per Month on A10Gs in AWS
Deploying Fine-Tuned LLMs
Deploying Multiple Fine-Tuned LLMs
llm1 = ft_model1.deploy()
llm2 = ft_model2.deploy()
llm3 = ft_model3.deploy()
LoRAX cost
Use Case 1: Chatbot Model
23
Customer: Security Company (Chatbot)
Assumptions
Request Volume (Inference)
● 800k requests / day
● 100 tokens / request (Input)
● 300 tokens / request (Output)
Dataset Size (Fine-Tuning)
● 10k rows (1M tokens) of Conversations
Annual Total Cost of Ownership (TCO)
Predibase: $87,563 ($64 / day)
OpenAI: $2,880,000 ($8K inference / day)
Use Case 2: Email Generation
24
Customer: Tech Company (Automated Email Generation)
Assumptions
Request Volume (Inference)
● 100k requests / day
● 200 tokens / request (Input)
● 500 tokens / request (Output)
Dataset Size (Fine-Tuning)
● 5k rows (2.5M tokens) of emails
Annual Total Cost of Ownership (TCO)
Predibase: $87,563 ($64 / day)
OpenAI: $612,000 ($1700 inference / day)
7,000+
downloads/month
10,300+
★ on GitHub
145+
contributors
~80
commits/month
Learn more: www.ludwig.ai
1000+
downloads/month
621
★ on GitHub
10
contributors
~100
commits/month
https://github.com/predibase/lorax
LoRAX
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost

More Related Content

What's hot

What's hot (20)

AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
Global Azure Bootcamp Pune 2023 - Lead the AI era with Microsoft Azure.pdf
Global Azure Bootcamp Pune 2023 -  Lead the AI era with Microsoft Azure.pdfGlobal Azure Bootcamp Pune 2023 -  Lead the AI era with Microsoft Azure.pdf
Global Azure Bootcamp Pune 2023 - Lead the AI era with Microsoft Azure.pdf
 
Responsible Generative AI
Responsible Generative AIResponsible Generative AI
Responsible Generative AI
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scale
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdf
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
 
Use Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdf
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AI
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
 
introduction Azure OpenAI by Usama wahab khan
introduction  Azure OpenAI by Usama wahab khanintroduction  Azure OpenAI by Usama wahab khan
introduction Azure OpenAI by Usama wahab khan
 
Gen AI Cognizant & AWS event presentation_12 Oct.pdf
Gen AI Cognizant & AWS event presentation_12 Oct.pdfGen AI Cognizant & AWS event presentation_12 Oct.pdf
Gen AI Cognizant & AWS event presentation_12 Oct.pdf
 
Exploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfExploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdf
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdf
 
Generative AI for the rest of us
Generative AI for the rest of usGenerative AI for the rest of us
Generative AI for the rest of us
 
The Future is in Responsible Generative AI
The Future is in Responsible Generative AIThe Future is in Responsible Generative AI
The Future is in Responsible Generative AI
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundly
 
Unlocking the Power of Generative AI Models and Systems such as GPT-4 and Cha...
Unlocking the Power of Generative AI Models and Systems such as GPT-4 and Cha...Unlocking the Power of Generative AI Models and Systems such as GPT-4 and Cha...
Unlocking the Power of Generative AI Models and Systems such as GPT-4 and Cha...
 

Similar to LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost

Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
Provectus
 

Similar to LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost (20)

Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Integrate Machine Learning into Your Spring Application in Less than an Hour
Integrate Machine Learning into Your Spring Application in Less than an HourIntegrate Machine Learning into Your Spring Application in Less than an Hour
Integrate Machine Learning into Your Spring Application in Less than an Hour
 
odsc_2023.pdf
odsc_2023.pdfodsc_2023.pdf
odsc_2023.pdf
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS
 
Serverless Functions and Machine Learning: Putting the AI in APIs
Serverless Functions and Machine Learning: Putting the AI in APIsServerless Functions and Machine Learning: Putting the AI in APIs
Serverless Functions and Machine Learning: Putting the AI in APIs
 
Securing your Machine Learning models
Securing your Machine Learning modelsSecuring your Machine Learning models
Securing your Machine Learning models
 
AI & AWS DeepComposer
AI & AWS DeepComposerAI & AWS DeepComposer
AI & AWS DeepComposer
 
Single Source of Truth for Network Automation
Single Source of Truth for Network AutomationSingle Source of Truth for Network Automation
Single Source of Truth for Network Automation
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMaker
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
Feature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsFeature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systems
 
Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for Developers
 
A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...
A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...
A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI Pipelines
 

More from Aggregage

Sales & Marketing Alignment_ How to Synergize for Success.pptx.pdf
Sales & Marketing Alignment_ How to Synergize for Success.pptx.pdfSales & Marketing Alignment_ How to Synergize for Success.pptx.pdf
Sales & Marketing Alignment_ How to Synergize for Success.pptx.pdf
Aggregage
 
How Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingHow Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of Reporting
Aggregage
 
The Engagement Engine: Strategies for Building a High-Performance Culture
The Engagement Engine: Strategies for Building a High-Performance CultureThe Engagement Engine: Strategies for Building a High-Performance Culture
The Engagement Engine: Strategies for Building a High-Performance Culture
Aggregage
 
Driving Business Impact for PMs with Jon Harmer
Driving Business Impact for PMs with Jon HarmerDriving Business Impact for PMs with Jon Harmer
Driving Business Impact for PMs with Jon Harmer
Aggregage
 
Breaking the Burnout Cycle: Empowering Managers for Excellence
Breaking the Burnout Cycle: Empowering Managers for ExcellenceBreaking the Burnout Cycle: Empowering Managers for Excellence
Breaking the Burnout Cycle: Empowering Managers for Excellence
Aggregage
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
Aggregage
 

More from Aggregage (20)

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
How to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail SuccessHow to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail Success
 
Sales & Marketing Alignment_ How to Synergize for Success.pptx.pdf
Sales & Marketing Alignment_ How to Synergize for Success.pptx.pdfSales & Marketing Alignment_ How to Synergize for Success.pptx.pdf
Sales & Marketing Alignment_ How to Synergize for Success.pptx.pdf
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
How Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of ReportingHow Automation is Driving Efficiency Through the Last Mile of Reporting
How Automation is Driving Efficiency Through the Last Mile of Reporting
 
Planning your Restaurant's Path to Profitability
Planning your Restaurant's Path to ProfitabilityPlanning your Restaurant's Path to Profitability
Planning your Restaurant's Path to Profitability
 
The Engagement Engine: Strategies for Building a High-Performance Culture
The Engagement Engine: Strategies for Building a High-Performance CultureThe Engagement Engine: Strategies for Building a High-Performance Culture
The Engagement Engine: Strategies for Building a High-Performance Culture
 
Driving Business Impact for PMs with Jon Harmer
Driving Business Impact for PMs with Jon HarmerDriving Business Impact for PMs with Jon Harmer
Driving Business Impact for PMs with Jon Harmer
 
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
 
The Retention Ripple Effect: Nonprofit Staff and Donor Dynamics
The Retention Ripple Effect: Nonprofit Staff and Donor DynamicsThe Retention Ripple Effect: Nonprofit Staff and Donor Dynamics
The Retention Ripple Effect: Nonprofit Staff and Donor Dynamics
 
Breaking the Burnout Cycle: Empowering Managers for Excellence
Breaking the Burnout Cycle: Empowering Managers for ExcellenceBreaking the Burnout Cycle: Empowering Managers for Excellence
Breaking the Burnout Cycle: Empowering Managers for Excellence
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
How to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentHow to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product Development
 
Bridging the Gap: The Intersection of DEI Initiatives and Employee Benefits
Bridging the Gap: The Intersection of DEI Initiatives and Employee BenefitsBridging the Gap: The Intersection of DEI Initiatives and Employee Benefits
Bridging the Gap: The Intersection of DEI Initiatives and Employee Benefits
 
Mapping Digital Transformation: Retail’s Strategic Shift
Mapping Digital Transformation: Retail’s Strategic ShiftMapping Digital Transformation: Retail’s Strategic Shift
Mapping Digital Transformation: Retail’s Strategic Shift
 
AI & DEI: With Great Opportunities Comes Great HR Responsibility
AI & DEI: With Great Opportunities Comes Great HR ResponsibilityAI & DEI: With Great Opportunities Comes Great HR Responsibility
AI & DEI: With Great Opportunities Comes Great HR Responsibility
 
Can Brain Science Actually Help Make Your Training & Teaching "Stick"?
Can Brain Science Actually Help Make Your Training & Teaching "Stick"?Can Brain Science Actually Help Make Your Training & Teaching "Stick"?
Can Brain Science Actually Help Make Your Training & Teaching "Stick"?
 
How Personalized Customer Experiences Drive Retail Growth and Revenue
How Personalized Customer Experiences Drive Retail Growth and RevenueHow Personalized Customer Experiences Drive Retail Growth and Revenue
How Personalized Customer Experiences Drive Retail Growth and Revenue
 
Your Expert Guide to CX Orchestration & Enhancing Customer Journeys
Your Expert Guide to CX Orchestration & Enhancing Customer JourneysYour Expert Guide to CX Orchestration & Enhancing Customer Journeys
Your Expert Guide to CX Orchestration & Enhancing Customer Journeys
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost

  • 1.
  • 2.
  • 3.
  • 4. Trust, but Verify Making GenAI applications production-ready Shreya Rajpal Guardrails AI
  • 5. About me Current CEO & Cofounder @ Guardrails AI Past ML Infra lead @ MLops Co, ML @ Self driving cars, Classical AI & Deep learning research
  • 6. We’re seeing a cambrian explosion of applications in AI
  • 7. Interest in Artificial Intelligence over time
  • 9. Why is this the case? Root Cause Machine learning is fundamentally non-deterministic. Symptom “My LLM application worked while prototyping, but failed the moment I handed it off to someone else.”
  • 10. Software APIs are deterministic…
  • 11. …but ML Model APIs are not
  • 12. Getting ‘correct’ outputs always is hard Some common issues: ● Hallucinations ● Falsehoods ● Lack of correct structure ● Prompt injection Only tool available to devs is the prompt
  • 13. 13 Use of LLMs is limited when “correctness” is critical. How do we add correctness guarantees to LLMs?
  • 14.
  • 16. Guardrails AI acts as a safety firewall around your LLMs
  • 18. What Guardrails AI does Guardrails AI is an open source AI-verification framework that supports ✅ Framework for creating custom validators ✅ Orchestration of prompting → verification → re-prompting ✅ Library of commonly used validators for multiple use cases ✅ Specification language for communicating requirements to LLM
  • 19. Implementing Guardrails Guardrails Grounding via external systems Rules-based heuristics Traditional ML methods High precision DL classifiers LLM self reflection
  • 20. 21 Case Study: Internal chatbot with “correct” responses Problem Build a chatbot over the help center articles of your mobile application Correctness Criteria Don’t hallucinate Don’t use foul language Don’t mention my competitors
  • 21. How do I prevent LLM hallucinations? Provenance Guardrails Every LLM utterance should have a source of truth. https://docs.guardrailsai.com/api_reference/validators/#guardrails.validators.ProvenanceV1
  • 25. Example: Validating “correctness” How do I change my password?
  • 26. Example: Validating “correctness” How do I change my password?
  • 27. How do I change my password? Example: Validating “correctness”
  • 28. How do I change my password? Example: Validating “correctness”
  • 29. How do I change my password? Example: Validating “correctness”
  • 30. How do I change my password? Example: Validating “correctness”
  • 31. How do I change my password? Example: Validating “correctness”
  • 32. More examples of validations ● Make sure my code is executable ● Never give financial or healthcare advice ● Don’t ask private questions ● Don’t mention competitors ● Ensure each sentence is from a verified source and is accurate ● No profanity is mentioned in text ● Prompt injection protection ● Never expose prompt or sources
  • 33. In Summary, Guardrails AI is an open source AI-verification framework that supports ✅ Framework for creating custom validators ✅ Orchestration of prompting → verification → re-prompting ✅ Library of commonly used validators for multiple use cases ✅ Specification language for communicating requirements to LLM
  • 34. Learn more ● Github: github.com/guardrails-ai/guardrails ● Website: guardrailsai.com ● Twitter: @ShreyaR or @guardrails_ai
  • 35.
  • 36. The future is fine-tuned Smaller, task-specific models will be the real AI wave. How do we pull up the future?
  • 37. The LLM Deep Learning Revolution is here Rapid advancements in general intelligence 70B parameters 1.4-1.8T parameters 2000 XGBoost Catboost BERT LLaMa-2 GPT4 2025+ 175B parameters GPT3 300M parameters 100 trees 1000 trees AlexNet 60M parameters Predictive Analytics Classification Generative and Conversational AI 11B parameters T5 XT parameters The next big OSS LLM Pre-Deep Learning The Deep Learning and LLM Era
  • 38. 3 Bigger isn’t always better General intelligence is great. But I don’t need my point-of- sales system to recite French Poetry.
  • 39. Graduating from OpenAI to open-source Commercial LLMs are a good starting point Great for rapid experimentation but… ✘ Lack model ownership ✘ Need to give up access to data ✘ Too slow, expensive & overkill for most tasks GPT-4 4
  • 40. Graduating from OpenAI to open-source But the future is fine-tuned and open-source BERT Mistral-7B Determine Customer Sentiment Prioritize Customer Support Tickets Customer Service Chatbot Llama2 -70B Benefits of smaller task-specific LLMs ✓ Own your models ✓ Don’t share data ✓ Smaller and faster ✓ Control the output GPT-4 5 Great for rapid experimentation but… ✘ Lack model ownership ✘ Need to give up access to data ✘ Too slow, expensive & overkill for most tasks
  • 41. Better performance and 250x smaller Fine-tuned models outperform their larger more expensive commercial alternatives
  • 42. Cold Start: where do I get training data?
  • 43. CASE STUDY Distilling OSS models for content moderation Pilot with Fine-Tuning ● Tested end to end on 4 large scale datasets (billions of rows) ● Tabular classification and regression tasks, highly imbalanced ● Compared against OS AutoML and Google AutoML Tables Outcomes ● Improved performance by +5/+20% ● Performance gap increased with data size ● Very fast training and prediction (25m-55m)
  • 44. 9 Flaky distributed training, frequent OOMs, and GPU shortages Serving each model requires expensive GPUs, need to autoscale and be production-grade Keeping up Costly Model Serving Complex Training Best practices in research are introduced by the week. How does your team stay updated? Productionizing LLMs is harder than it seems The main challenges engineering teams face in productionizing LLMs
  • 45. The reality of fine-tuning training 10
  • 46. Simplify Training with Easy to start Expert level control Generative AI input_features: name: sentence type: text output_features: name: intent type: category input_features: name: sentence type: text encoder: bert output_features: name: intent type: category trainer: regularize: 0.1 dropout: 0.05 model_type: llm name: llama-v2-13b input_features: name: question output_features: name: answer trainer: type: finetune peft: qlora From months to days No ML code required Readable & Reproducible Easy to Iterate Extensible Latest OSS models Efficient fine-tuning Retrieval augmentation An open-source declarative ML framework started at Uber
  • 47. 12 Declaratively Fine-Tune LLMs model_type: llm base_model: Llama-2-7b-hf input_features: - name: input type: text output_features: - name: output type: text trainer: type: finetune learning_rate: 0.0003 batch_size: 1 gradient_accumulation_steps: 8 epochs: 3 llm = LudwigModel(config) results = llm.train(df)
  • 48. 13 Parameter Efficient Fine-Tuning Pretrained LLM Dataset Pretrained LLM Pretrained LLM Dataset Additional Trainable Parameters Finetuned LLM All parameters are updated, which is very expensive Traditional Fine-Tuning Parameter Efficient Fine-Tuning Only a small set of new, task specific- parameters are updated Fine-tuned LLM includes these new additional parameters
  • 49. Low-Rank Adaptation (LoRA) Compress “fine-tunable” parameters to 0.5% - 10 % of total parameters in your LLM https://arxiv.org/abs/2106.09685 Let W have a shape of 1024 x 1024 LoRA matrices A and B have shapes 1024 x 8 and 8 x 1024 respectively. Multiplying A and B gives the same shape as W but only using 1024 * 8 * 2 parameters (1.5% of the weights in W) adapter: type: lora r: 32 adapter: type: lora quantization: bits: 4
  • 50.
  • 51. But how much will this cost in the cloud? GPU Tier $ / hr (AWS) VRAM (GiB) H100 Enterprise 12.29 80 A100 Enterprise 5.12 80 V100 Enterprise 3.90 32 A10G Enterprise 1.21 24 T4 Enterprise 0.98 16 RTX 4080 Consumer N/A 16
  • 52. Cost per Month on A10Gs in AWS
  • 54. Deploying Multiple Fine-Tuned LLMs llm1 = ft_model1.deploy() llm2 = ft_model2.deploy() llm3 = ft_model3.deploy()
  • 55.
  • 56.
  • 58. Use Case 1: Chatbot Model 23 Customer: Security Company (Chatbot) Assumptions Request Volume (Inference) ● 800k requests / day ● 100 tokens / request (Input) ● 300 tokens / request (Output) Dataset Size (Fine-Tuning) ● 10k rows (1M tokens) of Conversations Annual Total Cost of Ownership (TCO) Predibase: $87,563 ($64 / day) OpenAI: $2,880,000 ($8K inference / day)
  • 59. Use Case 2: Email Generation 24 Customer: Tech Company (Automated Email Generation) Assumptions Request Volume (Inference) ● 100k requests / day ● 200 tokens / request (Input) ● 500 tokens / request (Output) Dataset Size (Fine-Tuning) ● 5k rows (2.5M tokens) of emails Annual Total Cost of Ownership (TCO) Predibase: $87,563 ($64 / day) OpenAI: $612,000 ($1700 inference / day)
  • 60. 7,000+ downloads/month 10,300+ ★ on GitHub 145+ contributors ~80 commits/month Learn more: www.ludwig.ai 1000+ downloads/month 621 ★ on GitHub 10 contributors ~100 commits/month https://github.com/predibase/lorax LoRAX