Roman Kyslyi: Використання та побудова LLM агентів (UA)
AI & BigData Online Day 2024 Spring
Website – www.aiconf.com.ua
Youtube – https://www.youtube.com/startuplviv
FB – https://www.facebook.com/aiconf
5. Unsupervised learning:
data scraped from the Internet: think
clickbait, misinformation,
propaganda, conspiracy theories, or
attacks against certain
demographics.
Supervised learning: higher quality
data – think StackOverflow, Quora,
or human annotations – which
makes it somewhat socially
acceptable
RLHF: polished using RL to make it
customer-appropriate
6.
7. Supervised
fine tuning (SFT)
How to do that? We know that a model mimics its training data. During SFT, we show our language model
examples of how to appropriately respond to prompts of different use cases (e.g. question answering,
summarization, translation). The examples follow the format (prompt, response) and are called demonstration
data. OpenAI calls supervised fine tuning behavior cloning: you demonstrate how the model should behave,
and the model clones this behavior.
OpenAI’s 40 labelers created around 13,000 (prompt, response) pairs for InstructGPT.
8. Reward model (RM)
● Training data: high-quality data in the format of (prompt, winning_response, losing_response)
● Data scale: 100K - 1M examples
● rθ:: the reward model being trained, parameterized by θ. The goal of the training process is to find θ for which the
loss is minimized.
● Training data format: x- prompt; yw - winning response; yl - losing response
● For each training sample:
a. reward model’s score for the winning response: sw=rθ(x,yw)
b. reward model’s score for the losing response: sl=rθ(x,yl)
● Goal: find θ to minimize the expected loss for all training samples
16. Range of AI agents are possible
General Data Agents
● Access to more than
one tool
● Can accomplish a wider
range of tasks
Specialized Data Agents
● Similar to retrieval from
vector store
● But with access to real-
time information
Agents that can take
action in real world
● Book plane tickets
● Scheduling appointment
● Order doordash
● …
17. Data Agents - LLM-powered workers
Email
Read latest
emails
Knowledge
Base
Retrieve
context
Analysis
Agent
Analyze
file
Slack
Send
update
Data
Agent
● Perform automated search and
retrieval over different types of
data — unstructured, semi-
structured, and structured.
● Calling any external service API in
a structured fashion. They can
either process the response
immediately, or index/cache this
data for future use.
19. How to use agents?
Use our query engines as “data tools” over your
agent:
● Semantic search
● Summarization
● Text-to-SQL
● Document comparisons
● Combining Structured Data w/
Unstructured
“Simple” Interface - all agent has to infer is a
query string!
Example Notebook:
● OpenAI Agent + query engines (as tools)
● Analyzing structured + unstructured data
20. How to handle large responses from tools?
LoadAndSearchToolSpec
OnDemandLoaderTool
21. How to handle large number of tools?
● Build an index over your tools, and retrieve the most relevant ones to pass to your agent.
● Example Notebook
22. Metrics
Source: https://learn.microsoft.com/en-us/azure/machine-learning/prompt-
flow/concept-model-monitoring-generative-ai-evaluation-metrics?view=azureml-api-
2
Groundedness: evaluates how well the model's generated answers align with information from the input source.
Answers are verified as claims against context in the user-defined ground truth source: even if answers are true (factually correct),
if not verifiable against the source text, then it's scored as ungrounded (from 1 to 5).
Relevance: measures the extent to which the model's generated responses are pertinent and directly related to the given
questions
Similarity: quantifies the similarity between a ground truth sentence (or document) and the prediction sentence generated by
an AI model.
Problem: LLM is used to score each result between 0 and 10. Then the values are normalized.
23. User Input
API Tool
Tool Output
Reasoning
Agent
Conversation
History
Fetch History
Write History
Agent Failure Modes
Wrong tool
selection/input
Rogue Paths
24. User Input
API Tool
Wrong tool
selection/input
Hallucination Conversation
History
Fetch History
Write History
Failed API
Calls
Rogue Paths
Infinite Loops
Agent Failure Modes
25. 25
Testing RAGs for Hallucinations
Context Relevance
Is the retrieved context
relevant to the query?
Groundedness
Is the response supported
by the context?
Answer Relevance
Is the answer relevant to
the query?
Query
Context
Response
The RAG Triad
26. 26
Testing Agents for Hallucinations
Query
Context
Response
Agent
Tool Selection
The Agent Quad
Context Relevance
Groundedness
Answer Relevance
29. Experimenting with data agents
● Data agents give more certainty to eval by testing throughout the application
● Thorough testing of LLM apps ensures groundedness
Try yourself:
https://colab.research.google.com/drive/12oWmUfrPc1tC_C4ds8LS1sLrB0ikqneH?usp=sharing