SlideShare a Scribd company logo
1 of 21
Guardrails for LLM Applications
Bhanu Arya
Hi ! I’m Bhanu ☺
Work Experience Education Interests
Guardrails for
LLM
Applications
• Introduction
• Existing solution
• Challenges in building guardrails
• Case study
• Recommendations
Introduction –
Need for Guardrails
LLM Application - ChatGPT
5 days
Source: explodingtopics.com/blog/chatgpt-users
Gender bias in the response
Reorder the prompt
Correct response without bias
Fairness violation, inconsistent
response, lack of robustness
Asked the same questions again …
Model developers have implemented a variety of safety
protocols in ChatGPT, intended to confine the
behaviours. But still, it not good enough.
Need for Gaurdrails
Requirement
• Set of safety controls that monitor and dictate a user’s interaction with a LLM application
• Set of programmable, rule-based systems
Solution
• A guardrail is an algorithm that takes as input a set of objects and determines if and how some
enforcement actions can be taken to reduce the risks embedded in the objects
• Combination of code, machine learning models and external APIs to enforce these correctness
criteria
Gaurdrails component
Input validation
• Ensure input data into
LLM complies with set
of criteria preventing
misuse in model
generation
• Filter out prohibited
words or phrases,
remove Personally
Identifiable
Information, or
disallowing prompts
that can lead to biased
or dangerous outputs
Output filtering
• Examination and
modification of LLM
generated content
before it’s delivered to
the end user. screen
the output for any
unwanted, sensitive, or
harmful content
• Remove or replace
prohibited content,
such as hate speech, or
flag responses that
require human review
Usage monitoring
• Keep track of how,
when, and by whom
the LLM is being used
to detect and prevent
system abuse, as well
as assist in improving
the model's
performance
• User interactions are
logged with the LLM,
such as API requests,
frequency of use, types
of prompts used, and
responses generated
Feedback mechanisms
• Allow users and
moderators to provide
input about the LLM’s
generate content when
inappropriate
• Enable users to report
issues with the content
generated by the LLM,
to refine the input
validation, output
filtering, and overall
performance of the
model.
Existing Solutions
Llama Guard - Meta
It is a fine-tuned model (Llama2-7b) that takes the
input and output of the victim model as input and
predicts their classification on a set of user-
specified categories.
Lacks guaranteed reliability since the
classification results depend on the LLM’s
understanding of the categories and the
model’s predictive accuracy
NeMo - Nvidia
Embeds the prompt as a vector, and then uses Knearest neighbor (KNN) method to compare it with the stored
vector-based user canonical forms, retrieving the embedding vectors that are ‘the most similar’ to the
embedded input prompt
GuardrailsAI
1. Define specifications for return format limitation. E.g., structure and type
2. Activate and define specification as guard. E.g., toxicity checks, additional classifier
3. Trigger when guard detects error. E.g., generate corrective prompt or recheck the
output
Technical Challenges
in Building Guardrails
Challenges in designing guardrails
Conflicting Requirements
Tension between fairness, privacy and
robustness
Opinion based QA maybe more abstained
from responding
More succinct communication with fewer
details
Multidisciplinary Approach
Even after detecting harmful content,
LLM can still generate biased pr
misleading response
No universal definition of toxicity,
fairness
Domain specific scenario – specific rule
conflict with general principles
Different guardrails for different LLMs
systems / versions
System Development Lifecycle
Gaurdrail creation is comprehensive,
requiring project management,
development, testing, deployment,
maintenance and improvement
Rigorous verification and testing –
covering all test cases not feasible
Case study
Salesforce - Einstein Trust Layer
Source: developer.salesforce.com/blogs/2023/10/inside-the-einstein-trust-layer
Implements security guardrails from the product to policies
Recommendations
Recommendations
Currently, even the best LLM
application are not perfectly
immune despite the
guardrails
• Define Responsible AI
principles within
organization
• Set clear expectations to
stakeholders
• Educate end-user on
effectively using LLM
application
Building guardrail is a
continuous and infinite cycle
of attacks and defence
• Build domain and
application specific
guardrails are on top of
general guardrails
• Gather requirements from
multidisciplinary teams
with diverse backgrounds
Guardrails can also decrease
performance of LLM
application when conflicting
rules and guidelines
• Do thorough verification
and validation LLM
response
• Perform regression test for
application curated test
cases
• Get user feedback –
system defined and offline
Thank you ☺
Questions /
Comments /
Feedback ?

More Related Content

Similar to Gaurdrails for LLM applications

“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
Edge AI and Vision Alliance
 
Criterion 1A - 4 - MasteryPros and Cons Thoroughly compares the
Criterion 1A - 4 - MasteryPros and Cons Thoroughly compares theCriterion 1A - 4 - MasteryPros and Cons Thoroughly compares the
Criterion 1A - 4 - MasteryPros and Cons Thoroughly compares the
CruzIbarra161
 
renita lobo-CV-Automation
renita lobo-CV-Automationrenita lobo-CV-Automation
renita lobo-CV-Automation
Renita Lobo
 

Similar to Gaurdrails for LLM applications (20)

Implementing a testing strategy
Implementing a testing strategyImplementing a testing strategy
Implementing a testing strategy
 
Secure Cloud Issues
Secure Cloud IssuesSecure Cloud Issues
Secure Cloud Issues
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
 
Testing throughout the software life cycle - Testing & Implementation
Testing throughout the software life cycle - Testing & ImplementationTesting throughout the software life cycle - Testing & Implementation
Testing throughout the software life cycle - Testing & Implementation
 
Slcm sharbani bhattacharya
Slcm sharbani bhattacharyaSlcm sharbani bhattacharya
Slcm sharbani bhattacharya
 
IEEE 2014 JAVA DATA MINING PROJECTS Security evaluation of pattern classifier...
IEEE 2014 JAVA DATA MINING PROJECTS Security evaluation of pattern classifier...IEEE 2014 JAVA DATA MINING PROJECTS Security evaluation of pattern classifier...
IEEE 2014 JAVA DATA MINING PROJECTS Security evaluation of pattern classifier...
 
2014 IEEE JAVA DATA MINING PROJECT Security evaluation of pattern classifiers...
2014 IEEE JAVA DATA MINING PROJECT Security evaluation of pattern classifiers...2014 IEEE JAVA DATA MINING PROJECT Security evaluation of pattern classifiers...
2014 IEEE JAVA DATA MINING PROJECT Security evaluation of pattern classifiers...
 
Machine Learning Approach for Quality Assessment and Prediction in Large Soft...
Machine Learning Approach for Quality Assessmentand Prediction in Large Soft...Machine Learning Approach for Quality Assessmentand Prediction in Large Soft...
Machine Learning Approach for Quality Assessment and Prediction in Large Soft...
 
Criterion 1A - 4 - MasteryPros and Cons Thoroughly compares the
Criterion 1A - 4 - MasteryPros and Cons Thoroughly compares theCriterion 1A - 4 - MasteryPros and Cons Thoroughly compares the
Criterion 1A - 4 - MasteryPros and Cons Thoroughly compares the
 
Software testing ppt
Software testing pptSoftware testing ppt
Software testing ppt
 
System testing
System testingSystem testing
System testing
 
6. oose testing
6. oose testing6. oose testing
6. oose testing
 
Manual Testing Training From myTectra in bangalore
Manual Testing Training From myTectra in bangalore Manual Testing Training From myTectra in bangalore
Manual Testing Training From myTectra in bangalore
 
An Overview of Advesarial-attack-in-Recommender-system.pptx
An Overview of Advesarial-attack-in-Recommender-system.pptxAn Overview of Advesarial-attack-in-Recommender-system.pptx
An Overview of Advesarial-attack-in-Recommender-system.pptx
 
Software Testing
Software Testing Software Testing
Software Testing
 
Test Automation Framework An Insight into Some Popular Automation Frameworks.pdf
Test Automation Framework An Insight into Some Popular Automation Frameworks.pdfTest Automation Framework An Insight into Some Popular Automation Frameworks.pdf
Test Automation Framework An Insight into Some Popular Automation Frameworks.pdf
 
Building model trust
Building model trustBuilding model trust
Building model trust
 
renita lobo-CV-Automation
renita lobo-CV-Automationrenita lobo-CV-Automation
renita lobo-CV-Automation
 
Ncerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssmNcerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssm
 
Design principles & quality factors
Design principles & quality factorsDesign principles & quality factors
Design principles & quality factors
 

Recently uploaded

“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
Wonjun Hwang
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 

Recently uploaded (20)

Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 

Gaurdrails for LLM applications

  • 1. Guardrails for LLM Applications Bhanu Arya
  • 2. Hi ! I’m Bhanu ☺ Work Experience Education Interests
  • 3. Guardrails for LLM Applications • Introduction • Existing solution • Challenges in building guardrails • Case study • Recommendations
  • 5. LLM Application - ChatGPT 5 days Source: explodingtopics.com/blog/chatgpt-users
  • 6. Gender bias in the response
  • 7. Reorder the prompt Correct response without bias
  • 8. Fairness violation, inconsistent response, lack of robustness Asked the same questions again … Model developers have implemented a variety of safety protocols in ChatGPT, intended to confine the behaviours. But still, it not good enough.
  • 9. Need for Gaurdrails Requirement • Set of safety controls that monitor and dictate a user’s interaction with a LLM application • Set of programmable, rule-based systems Solution • A guardrail is an algorithm that takes as input a set of objects and determines if and how some enforcement actions can be taken to reduce the risks embedded in the objects • Combination of code, machine learning models and external APIs to enforce these correctness criteria
  • 10. Gaurdrails component Input validation • Ensure input data into LLM complies with set of criteria preventing misuse in model generation • Filter out prohibited words or phrases, remove Personally Identifiable Information, or disallowing prompts that can lead to biased or dangerous outputs Output filtering • Examination and modification of LLM generated content before it’s delivered to the end user. screen the output for any unwanted, sensitive, or harmful content • Remove or replace prohibited content, such as hate speech, or flag responses that require human review Usage monitoring • Keep track of how, when, and by whom the LLM is being used to detect and prevent system abuse, as well as assist in improving the model's performance • User interactions are logged with the LLM, such as API requests, frequency of use, types of prompts used, and responses generated Feedback mechanisms • Allow users and moderators to provide input about the LLM’s generate content when inappropriate • Enable users to report issues with the content generated by the LLM, to refine the input validation, output filtering, and overall performance of the model.
  • 12. Llama Guard - Meta It is a fine-tuned model (Llama2-7b) that takes the input and output of the victim model as input and predicts their classification on a set of user- specified categories. Lacks guaranteed reliability since the classification results depend on the LLM’s understanding of the categories and the model’s predictive accuracy
  • 13. NeMo - Nvidia Embeds the prompt as a vector, and then uses Knearest neighbor (KNN) method to compare it with the stored vector-based user canonical forms, retrieving the embedding vectors that are ‘the most similar’ to the embedded input prompt
  • 14. GuardrailsAI 1. Define specifications for return format limitation. E.g., structure and type 2. Activate and define specification as guard. E.g., toxicity checks, additional classifier 3. Trigger when guard detects error. E.g., generate corrective prompt or recheck the output
  • 16. Challenges in designing guardrails Conflicting Requirements Tension between fairness, privacy and robustness Opinion based QA maybe more abstained from responding More succinct communication with fewer details Multidisciplinary Approach Even after detecting harmful content, LLM can still generate biased pr misleading response No universal definition of toxicity, fairness Domain specific scenario – specific rule conflict with general principles Different guardrails for different LLMs systems / versions System Development Lifecycle Gaurdrail creation is comprehensive, requiring project management, development, testing, deployment, maintenance and improvement Rigorous verification and testing – covering all test cases not feasible
  • 18. Salesforce - Einstein Trust Layer Source: developer.salesforce.com/blogs/2023/10/inside-the-einstein-trust-layer Implements security guardrails from the product to policies
  • 20. Recommendations Currently, even the best LLM application are not perfectly immune despite the guardrails • Define Responsible AI principles within organization • Set clear expectations to stakeholders • Educate end-user on effectively using LLM application Building guardrail is a continuous and infinite cycle of attacks and defence • Build domain and application specific guardrails are on top of general guardrails • Gather requirements from multidisciplinary teams with diverse backgrounds Guardrails can also decrease performance of LLM application when conflicting rules and guidelines • Do thorough verification and validation LLM response • Perform regression test for application curated test cases • Get user feedback – system defined and offline
  • 21. Thank you ☺ Questions / Comments / Feedback ?