Roman Kyslyi: Використання та побудова LLM агентів (UA)

Hallucinations
Types of Hallucinations

Unsupervised learning:
data scraped from the Internet: think
clickbait, misinformation,
propaganda, conspiracy theories, or
attacks against certain
demographics.
Supervised learning: higher quality
data – think StackOverflow, Quora,
or human annotations – which
makes it somewhat socially
acceptable
RLHF: polished using RL to make it
customer-appropriate

Supervised
fine tuning (SFT)
How to do that? We know that a model mimics its training data. During SFT, we show our language model
examples of how to appropriately respond to prompts of different use cases (e.g. question answering,
summarization, translation). The examples follow the format (prompt, response) and are called demonstration
data. OpenAI calls supervised fine tuning behavior cloning: you demonstrate how the model should behave,
and the model clones this behavior.
OpenAI’s 40 labelers created around 13,000 (prompt, response) pairs for InstructGPT.

Reward model (RM)
● Training data: high-quality data in the format of (prompt, winning_response, losing_response)
● Data scale: 100K - 1M examples
● rθ:: the reward model being trained, parameterized by θ. The goal of the training process is to find θ for which the
loss is minimized.
● Training data format: x- prompt; yw - winning response; yl - losing response
● For each training sample:
a. reward model’s score for the winning response: sw=rθ(x,yw)
b. reward model’s score for the losing response: sl=rθ(x,yl)
● Goal: find θ to minimize the expected loss for all training samples

RAG
● Retrieval Augmented Generation
● MuRAG: Multimodal Retrieval-Augmented Generator
● Ensemble of RAG
● HyDE: (Hypothetical Document Embeddings)

Chain of Thoughts
Chain of Thoughts: https://arxiv.org/abs/2201.11903

Compressing information (why gzip will not work)

LLM
RDBMS
Data Lake
Keyword search
Vector Search
Prompt

ReAct: Reasoning + Acting with LLMs
Source: https://react-lm.github.io/

ReAct: HotpotQA example
Source: https://react-lm.github.io/

Range of AI agents are possible
General Data Agents
● Access to more than
one tool
● Can accomplish a wider
range of tasks
Specialized Data Agents
● Similar to retrieval from
vector store
● But with access to real-
time information
Agents that can take
action in real world
● Book plane tickets
● Scheduling appointment
● Order doordash
● …

Data Agents - LLM-powered workers
Email
Read latest
emails
Knowledge
Base
Retrieve
context
Analysis
Agent
Analyze
file
Slack
Send
update
Data
Agent
● Perform automated search and
retrieval over different types of
data — unstructured, semi-
structured, and structured.
● Calling any external service API in
a structured fashion. They can
either process the response
immediately, or index/cache this
data for future use.

Data Agents - Core Components
Agent Reasoning Loop
● ReAct Agent (any LLM)
● OpenAI Agent (only OAI)
Tools
Query Engine Tools (RAG
pipeline)
LlamaHub Tools
● Code interpreter
● Slack
● Notion
● Zapier
● … (15+ tools)

How to use agents?
Use our query engines as “data tools” over your
agent:
● Semantic search
● Summarization
● Text-to-SQL
● Document comparisons
● Combining Structured Data w/
Unstructured
“Simple” Interface - all agent has to infer is a
query string!
Example Notebook:
● OpenAI Agent + query engines (as tools)
● Analyzing structured + unstructured data

How to handle large responses from tools?
LoadAndSearchToolSpec
OnDemandLoaderTool

How to handle large number of tools?
● Build an index over your tools, and retrieve the most relevant ones to pass to your agent.
● Example Notebook

Metrics
Source: https://learn.microsoft.com/en-us/azure/machine-learning/prompt-
flow/concept-model-monitoring-generative-ai-evaluation-metrics?view=azureml-api-
2
Groundedness: evaluates how well the model's generated answers align with information from the input source.
Answers are verified as claims against context in the user-defined ground truth source: even if answers are true (factually correct),
if not verifiable against the source text, then it's scored as ungrounded (from 1 to 5).
Relevance: measures the extent to which the model's generated responses are pertinent and directly related to the given
questions
Similarity: quantifies the similarity between a ground truth sentence (or document) and the prediction sentence generated by
an AI model.
Problem: LLM is used to score each result between 0 and 10. Then the values are normalized.

User Input
API Tool
Tool Output
Reasoning
Agent
Conversation
History
Fetch History
Write History
Agent Failure Modes
Wrong tool
selection/input
Rogue Paths

User Input
API Tool
Wrong tool
selection/input
Hallucination Conversation
History
Fetch History
Write History
Failed API
Calls
Rogue Paths
Infinite Loops
Agent Failure Modes

25
Testing RAGs for Hallucinations
Context Relevance
Is the retrieved context
relevant to the query?
Groundedness
Is the response supported
by the context?
Answer Relevance
Is the answer relevant to
the query?
Query
Context
Response
The RAG Triad

26
Testing Agents for Hallucinations
Query
Context
Response
Agent
Tool Selection
The Agent Quad
Context Relevance
Groundedness
Answer Relevance

Blog Post: https://blog.llamaindex.ai/building-
better-tools-for-llm-agents-f8c5a6714f11
● Writing useful tool prompts
● Make tools tolerant of partial/faulty inputs
● Prompt engineering error messages
● Returning the right prompts for “POST”
requests
● Don’t overload agent with tools
● Try hierarchical agent modeling
Best practices for Agents

Question Answer Relevance
● Is the app’s response helpful?

Experimenting with data agents
● Data agents give more certainty to eval by testing throughout the application
● Thorough testing of LLM apps ensures groundedness
Try yourself:
https://colab.research.google.com/drive/12oWmUfrPc1tC_C4ds8LS1sLrB0ikqneH?usp=sharing

Roman Kyslyi: Використання та побудова LLM агентів (UA)

Recommended

Recommended

More Related Content

Similar to Roman Kyslyi: Використання та побудова LLM агентів (UA)

Similar to Roman Kyslyi: Використання та побудова LLM агентів (UA) (20)

More from Lviv Startup Club

More from Lviv Startup Club (20)

Recently uploaded

Recently uploaded (20)

Roman Kyslyi: Використання та побудова LLM агентів (UA)