Public
Responsible LLMOps:
Integrating Responsible AI
Practices into LLMOps
Debmalya Biswas
Wipro AI, Switzerland
Oct 2024, Sofia, Bulgaria
Public
Generative AI Lifecycle
Optimize and
deploy for
inference
with
enterprise
integration
LLMOps
deployment
of the
developed
Gen AI
solution
Prompt
Engineering
Responsible
AI guardrails
Evaluate
Define the
Use-case
Choose an
existing
LLM or
pre-train
own LLM
RAG
Fine-tuning
Human
Feedback Loop
Scope Select Adapt Operate and Use
Public
Responsible AI Guardrails
Domain Guardrail
Restricts the query or conversation to
specific domain. Gracefully guides the
conversation into desired domain
Safety Guardrail
Removes the unsafe & toxic responses
from LLM. Applies policies to deliver
appropriate response
Secure Guardrail
Prevent LLM to execute codes or connect
with applications. Also, provides access
controls to the queries & response.
Transparency Guardrail
Point to data source/documents from
where response is crafted, Highlight words
contributing to response being generated.
Programmable
Constraints
Easy User
Config
Safe
Conversational
system
Public
Gen AI Architecture Patterns – APIs & Embedded Gen AI
* D. Biswas. Generative AI – LLMOps Architecture Patterns. Data Driven Investor, 2023 (link)
LLM APIs: This is the classic
ChatGPT example, where we
have black-box access to a
LLM API/UI. Prompts are the
primary interaction mechanism
for such scenarios.
Enterprise LLM Apps have the
potential to accelerate LLM
adoption by providing LLM
capabilities embedded within
enterprise platforms, e.g., SAP,
Salesforce, ServiceNow.
Public
Gen AI Architecture Patterns – Fine-tuning
LLMs are generic in
nature. To realize the full
potential of LLMs for
enterprises, they need to
be contextualized with
enterprise knowledge
captured in terms of
documents, wikis,
business processes, etc.
This is achieved by fine-
tuning a LLM with
enterprise knowledge /
embeddings to develop a
context-specific LLM/
SLM.
Responsible AI safeguards
Public
Gen AI Architecture Patterns – RAGs & Agentic AI
Fine-tuning is a computationally
intensive process. RAGs provide a
viable alternative here by
providing additional context with
the prompt, grounding the retrieval
/ responses to the given context.
The future where enterprises will
be able to develop new enterprise
AI Apps by orchestrating /
composing multiple existing AI
Agents.
*D. Biswas. Constraints Enabled Autonomous Agent Marketplace: Discovery
and Matchmaking. ICAART (1) 2024: 396-403 (link)
Public
Responsible
Deployment
of LLMs
D. Biswas, D. Chakraborty, B.
Mitra. Responsible LLMOps..
Towards Data Science, 2024 (link)
Public
ML Privacy Risks
Two broad categories of
privacy inference attacks:
• Membership inference (if a
specific user data item was
present in the training
dataset) and
• Property inference
(reconstruct properties of a
participant’s dataset)
attacks.
Black box attacks are still
possible when the attacker
only has access to the APIs:
invoke the model and observe
the relationships between
inputs and outputs.
Training
dataset
wants access to
ML Model
(Classification,
Prediction)
Inference
API
has access to
Attacker
* D. Biswas. Privacy Preserving Chatbot Conversations. IEEE AIKE 2020: 179-182 (link)
*D. Biswas, K. Vidyasankar. A Privacy Framework for Hierarchical Federated Learning. CIKM Workshops 2021 (link)
Public
Gen AI Privacy Risks – novel challenges
We need to consider the
following additional privacy risks
in the case of Gen AI / LLMs:
• Membership and Property
leakage from Pre-training
data
• Model features leakage from
Pre-trained LLM
• Privacy leakage from
Conversations (history) with
LLMs
• Compliance with Privacy
Intent of Users
Training dataset
(Public/proprietary)
Pre-trained Large
Language Model
(LLM)
LLM API
Mobile / Web UI
End user Apps
Prompts
Tasks /
Queries
Users
LLM Provider
Feedback loop
Membership &
Property leakage from
Pre-training Data
Model Features
leakage from
Pre-trained LLM
(Implicit) Privacy
leakage from
Conversations History
Compliance
with Privacy
Intent of Users
Gen AI / LLM Conversational Privacy Risks
Public
LLM Safety Leaderboard
*Hugging Face LLM Safety Leaderboard (link)
*B. Wang, et. Al. DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT
Models, 2024 (link)
Public
Use-case specific Evaluation of LLMs
Need for a comprehensive LLM evaluation strategy with targeted
success metrics specific to the use-cases.
Public
Thanks
&
Questions
Debmalya Biswas
https://www.linkedin.com/in/debmalya-
biswas-3975261/
https://medium.com/@debmalyabiswas

Responsible LLMOps presentation at Webit 2024

  • 1.
    Public Responsible LLMOps: Integrating ResponsibleAI Practices into LLMOps Debmalya Biswas Wipro AI, Switzerland Oct 2024, Sofia, Bulgaria
  • 2.
    Public Generative AI Lifecycle Optimizeand deploy for inference with enterprise integration LLMOps deployment of the developed Gen AI solution Prompt Engineering Responsible AI guardrails Evaluate Define the Use-case Choose an existing LLM or pre-train own LLM RAG Fine-tuning Human Feedback Loop Scope Select Adapt Operate and Use
  • 3.
    Public Responsible AI Guardrails DomainGuardrail Restricts the query or conversation to specific domain. Gracefully guides the conversation into desired domain Safety Guardrail Removes the unsafe & toxic responses from LLM. Applies policies to deliver appropriate response Secure Guardrail Prevent LLM to execute codes or connect with applications. Also, provides access controls to the queries & response. Transparency Guardrail Point to data source/documents from where response is crafted, Highlight words contributing to response being generated. Programmable Constraints Easy User Config Safe Conversational system
  • 4.
    Public Gen AI ArchitecturePatterns – APIs & Embedded Gen AI * D. Biswas. Generative AI – LLMOps Architecture Patterns. Data Driven Investor, 2023 (link) LLM APIs: This is the classic ChatGPT example, where we have black-box access to a LLM API/UI. Prompts are the primary interaction mechanism for such scenarios. Enterprise LLM Apps have the potential to accelerate LLM adoption by providing LLM capabilities embedded within enterprise platforms, e.g., SAP, Salesforce, ServiceNow.
  • 5.
    Public Gen AI ArchitecturePatterns – Fine-tuning LLMs are generic in nature. To realize the full potential of LLMs for enterprises, they need to be contextualized with enterprise knowledge captured in terms of documents, wikis, business processes, etc. This is achieved by fine- tuning a LLM with enterprise knowledge / embeddings to develop a context-specific LLM/ SLM. Responsible AI safeguards
  • 6.
    Public Gen AI ArchitecturePatterns – RAGs & Agentic AI Fine-tuning is a computationally intensive process. RAGs provide a viable alternative here by providing additional context with the prompt, grounding the retrieval / responses to the given context. The future where enterprises will be able to develop new enterprise AI Apps by orchestrating / composing multiple existing AI Agents. *D. Biswas. Constraints Enabled Autonomous Agent Marketplace: Discovery and Matchmaking. ICAART (1) 2024: 396-403 (link)
  • 7.
    Public Responsible Deployment of LLMs D. Biswas,D. Chakraborty, B. Mitra. Responsible LLMOps.. Towards Data Science, 2024 (link)
  • 8.
    Public ML Privacy Risks Twobroad categories of privacy inference attacks: • Membership inference (if a specific user data item was present in the training dataset) and • Property inference (reconstruct properties of a participant’s dataset) attacks. Black box attacks are still possible when the attacker only has access to the APIs: invoke the model and observe the relationships between inputs and outputs. Training dataset wants access to ML Model (Classification, Prediction) Inference API has access to Attacker * D. Biswas. Privacy Preserving Chatbot Conversations. IEEE AIKE 2020: 179-182 (link) *D. Biswas, K. Vidyasankar. A Privacy Framework for Hierarchical Federated Learning. CIKM Workshops 2021 (link)
  • 9.
    Public Gen AI PrivacyRisks – novel challenges We need to consider the following additional privacy risks in the case of Gen AI / LLMs: • Membership and Property leakage from Pre-training data • Model features leakage from Pre-trained LLM • Privacy leakage from Conversations (history) with LLMs • Compliance with Privacy Intent of Users Training dataset (Public/proprietary) Pre-trained Large Language Model (LLM) LLM API Mobile / Web UI End user Apps Prompts Tasks / Queries Users LLM Provider Feedback loop Membership & Property leakage from Pre-training Data Model Features leakage from Pre-trained LLM (Implicit) Privacy leakage from Conversations History Compliance with Privacy Intent of Users Gen AI / LLM Conversational Privacy Risks
  • 10.
    Public LLM Safety Leaderboard *HuggingFace LLM Safety Leaderboard (link) *B. Wang, et. Al. DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models, 2024 (link)
  • 11.
    Public Use-case specific Evaluationof LLMs Need for a comprehensive LLM evaluation strategy with targeted success metrics specific to the use-cases.
  • 12.