SlideShare a Scribd company logo
1 of 30
LLM Agents
Hallucinations
Hallucinations
Types of Hallucinations
Unsupervised learning:
data scraped from the Internet: think
clickbait, misinformation,
propaganda, conspiracy theories, or
attacks against certain
demographics.
Supervised learning: higher quality
data – think StackOverflow, Quora,
or human annotations – which
makes it somewhat socially
acceptable
RLHF: polished using RL to make it
customer-appropriate
Supervised
fine tuning (SFT)
How to do that? We know that a model mimics its training data. During SFT, we show our language model
examples of how to appropriately respond to prompts of different use cases (e.g. question answering,
summarization, translation). The examples follow the format (prompt, response) and are called demonstration
data. OpenAI calls supervised fine tuning behavior cloning: you demonstrate how the model should behave,
and the model clones this behavior.
OpenAI’s 40 labelers created around 13,000 (prompt, response) pairs for InstructGPT.
Reward model (RM)
● Training data: high-quality data in the format of (prompt, winning_response, losing_response)
● Data scale: 100K - 1M examples
● rθ:: the reward model being trained, parameterized by θ. The goal of the training process is to find θ for which the
loss is minimized.
● Training data format: x- prompt; yw - winning response; yl - losing response
● For each training sample:
a. reward model’s score for the winning response: sw=rθ(x,yw)
b. reward model’s score for the losing response: sl=rθ(x,yl)
● Goal: find θ to minimize the expected loss for all training samples
RAG
● Retrieval Augmented Generation
● MuRAG: Multimodal Retrieval-Augmented Generator
● Ensemble of RAG
● HyDE: (Hypothetical Document Embeddings)
Chain of Thoughts
Chain of Thoughts: https://arxiv.org/abs/2201.11903
Compressing information (why gzip will not work)
LLM
RDBMS
Data Lake
Keyword search
Vector Search
Prompt
ReAct: Reasoning + Acting with LLMs
Source: https://react-lm.github.io/
ChatGPT plugins
ReAct: HotpotQA example
Source: https://react-lm.github.io/
Range of AI agents are possible
General Data Agents
● Access to more than
one tool
● Can accomplish a wider
range of tasks
Specialized Data Agents
● Similar to retrieval from
vector store
● But with access to real-
time information
Agents that can take
action in real world
● Book plane tickets
● Scheduling appointment
● Order doordash
● …
Data Agents - LLM-powered workers
Email
Read latest
emails
Knowledge
Base
Retrieve
context
Analysis
Agent
Analyze
file
Slack
Send
update
Data
Agent
● Perform automated search and
retrieval over different types of
data — unstructured, semi-
structured, and structured.
● Calling any external service API in
a structured fashion. They can
either process the response
immediately, or index/cache this
data for future use.
Data Agents - Core Components
Agent Reasoning Loop
● ReAct Agent (any LLM)
● OpenAI Agent (only OAI)
Tools
Query Engine Tools (RAG
pipeline)
LlamaHub Tools
● Code interpreter
● Slack
● Notion
● Zapier
● … (15+ tools)
How to use agents?
Use our query engines as “data tools” over your
agent:
● Semantic search
● Summarization
● Text-to-SQL
● Document comparisons
● Combining Structured Data w/
Unstructured
“Simple” Interface - all agent has to infer is a
query string!
Example Notebook:
● OpenAI Agent + query engines (as tools)
● Analyzing structured + unstructured data
How to handle large responses from tools?
LoadAndSearchToolSpec
OnDemandLoaderTool
How to handle large number of tools?
● Build an index over your tools, and retrieve the most relevant ones to pass to your agent.
● Example Notebook
Metrics
Source: https://learn.microsoft.com/en-us/azure/machine-learning/prompt-
flow/concept-model-monitoring-generative-ai-evaluation-metrics?view=azureml-api-
2
Groundedness: evaluates how well the model's generated answers align with information from the input source.
Answers are verified as claims against context in the user-defined ground truth source: even if answers are true (factually correct),
if not verifiable against the source text, then it's scored as ungrounded (from 1 to 5).
Relevance: measures the extent to which the model's generated responses are pertinent and directly related to the given
questions
Similarity: quantifies the similarity between a ground truth sentence (or document) and the prediction sentence generated by
an AI model.
Problem: LLM is used to score each result between 0 and 10. Then the values are normalized.
User Input
API Tool
Tool Output
Reasoning
Agent
Conversation
History
Fetch History
Write History
Agent Failure Modes
Wrong tool
selection/input
Rogue Paths
User Input
API Tool
Wrong tool
selection/input
Hallucination Conversation
History
Fetch History
Write History
Failed API
Calls
Rogue Paths
Infinite Loops
Agent Failure Modes
25
Testing RAGs for Hallucinations
Context Relevance
Is the retrieved context
relevant to the query?
Groundedness
Is the response supported
by the context?
Answer Relevance
Is the answer relevant to
the query?
Query
Context
Response
The RAG Triad
26
Testing Agents for Hallucinations
Query
Context
Response
Agent
Tool Selection
The Agent Quad
Context Relevance
Groundedness
Answer Relevance
Blog Post: https://blog.llamaindex.ai/building-
better-tools-for-llm-agents-f8c5a6714f11
● Writing useful tool prompts
● Make tools tolerant of partial/faulty inputs
● Prompt engineering error messages
● Returning the right prompts for “POST”
requests
● Don’t overload agent with tools
● Try hierarchical agent modeling
Best practices for Agents
Question Answer Relevance
● Is the app’s response helpful?
Experimenting with data agents
● Data agents give more certainty to eval by testing throughout the application
● Thorough testing of LLM apps ensures groundedness
Try yourself:
https://colab.research.google.com/drive/12oWmUfrPc1tC_C4ds8LS1sLrB0ikqneH?usp=sharing
Example

More Related Content

Similar to Roman Kyslyi: Використання та побудова LLM агентів (UA)

Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfAnastasiaSteele10
 
Security of LLM APIs by Ankita Gupta, Akto.io
Security of LLM APIs by Ankita Gupta, Akto.ioSecurity of LLM APIs by Ankita Gupta, Akto.io
Security of LLM APIs by Ankita Gupta, Akto.ioNordic APIs
 
LLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureAggregage
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位eLearning Consortium 電子學習聯盟
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...ssuser4edc93
 
machinecanthink-160226155704.pdf
machinecanthink-160226155704.pdfmachinecanthink-160226155704.pdf
machinecanthink-160226155704.pdfPranavPatil822557
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningGovind Mudumbai
 
5 Algorithms Every Web Developer Can Use and Understand
5 Algorithms Every Web Developer Can Use and Understand5 Algorithms Every Web Developer Can Use and Understand
5 Algorithms Every Web Developer Can Use and UnderstandMatt Kiser
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Qualitative Content Analysis
Qualitative Content AnalysisQualitative Content Analysis
Qualitative Content AnalysisRicky Bilakhia
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?Samet KILICTAS
 
Generative AI in CSharp with Semantic Kernel.pptx
Generative AI in CSharp with Semantic Kernel.pptxGenerative AI in CSharp with Semantic Kernel.pptx
Generative AI in CSharp with Semantic Kernel.pptxAlon Fliess
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorialThengo Kim
 
Recent Trends in Semantic Search Technologies
Recent Trends in Semantic Search TechnologiesRecent Trends in Semantic Search Technologies
Recent Trends in Semantic Search TechnologiesThanh Tran
 
Machine Learning AND Deep Learning for OpenPOWER
Machine Learning AND Deep Learning for OpenPOWERMachine Learning AND Deep Learning for OpenPOWER
Machine Learning AND Deep Learning for OpenPOWERGanesan Narayanasamy
 

Similar to Roman Kyslyi: Використання та побудова LLM агентів (UA) (20)

Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
 
Security of LLM APIs by Ankita Gupta, Akto.io
Security of LLM APIs by Ankita Gupta, Akto.ioSecurity of LLM APIs by Ankita Gupta, Akto.io
Security of LLM APIs by Ankita Gupta, Akto.io
 
TechDayPakistan-Slides RAG with Cosmos DB.pptx
TechDayPakistan-Slides RAG with Cosmos DB.pptxTechDayPakistan-Slides RAG with Cosmos DB.pptx
TechDayPakistan-Slides RAG with Cosmos DB.pptx
 
LLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
 
Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
machinecanthink-160226155704.pdf
machinecanthink-160226155704.pdfmachinecanthink-160226155704.pdf
machinecanthink-160226155704.pdf
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
5 Algorithms Every Web Developer Can Use and Understand
5 Algorithms Every Web Developer Can Use and Understand5 Algorithms Every Web Developer Can Use and Understand
5 Algorithms Every Web Developer Can Use and Understand
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Qualitative Content Analysis
Qualitative Content AnalysisQualitative Content Analysis
Qualitative Content Analysis
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Generative AI in CSharp with Semantic Kernel.pptx
Generative AI in CSharp with Semantic Kernel.pptxGenerative AI in CSharp with Semantic Kernel.pptx
Generative AI in CSharp with Semantic Kernel.pptx
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorial
 
Recent Trends in Semantic Search Technologies
Recent Trends in Semantic Search TechnologiesRecent Trends in Semantic Search Technologies
Recent Trends in Semantic Search Technologies
 
Machine Learning AND Deep Learning for OpenPOWER
Machine Learning AND Deep Learning for OpenPOWERMachine Learning AND Deep Learning for OpenPOWER
Machine Learning AND Deep Learning for OpenPOWER
 

More from Lviv Startup Club

Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...Lviv Startup Club
 
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...Lviv Startup Club
 
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...Lviv Startup Club
 
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Lviv Startup Club
 
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)Lviv Startup Club
 
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Lviv Startup Club
 
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...Lviv Startup Club
 
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...Lviv Startup Club
 
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...Lviv Startup Club
 
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)Lviv Startup Club
 
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...Lviv Startup Club
 
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)Lviv Startup Club
 
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)Lviv Startup Club
 
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...Lviv Startup Club
 
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...Lviv Startup Club
 
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)Lviv Startup Club
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)Lviv Startup Club
 
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...Lviv Startup Club
 
Ihor Pavlenko: PMO Resource Management (UA)
Ihor Pavlenko: PMO Resource Management (UA)Ihor Pavlenko: PMO Resource Management (UA)
Ihor Pavlenko: PMO Resource Management (UA)Lviv Startup Club
 

More from Lviv Startup Club (20)

Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
 
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
 
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
 
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
 
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
 
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
 
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
 
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
 
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
 
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
 
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
 
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
 
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
 
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
 
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
 
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
 
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
 
Ihor Pavlenko: PMO Resource Management (UA)
Ihor Pavlenko: PMO Resource Management (UA)Ihor Pavlenko: PMO Resource Management (UA)
Ihor Pavlenko: PMO Resource Management (UA)
 

Recently uploaded

Non Text Magic Studio Magic Design for Presentations L&P.pptx
Non Text Magic Studio Magic Design for Presentations L&P.pptxNon Text Magic Studio Magic Design for Presentations L&P.pptx
Non Text Magic Studio Magic Design for Presentations L&P.pptxAbhayThakur200703
 
BEST Call Girls In BELLMONT HOTEL ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In BELLMONT HOTEL ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In BELLMONT HOTEL ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In BELLMONT HOTEL ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailAriel592675
 
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...lizamodels9
 
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In.../:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...lizamodels9
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Roomdivyansh0kumar0
 
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...lizamodels9
 
Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Timedelhimodelshub1
 
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCRsoniya singh
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
Investment analysis and portfolio management
Investment analysis and portfolio managementInvestment analysis and portfolio management
Investment analysis and portfolio managementJunaidKhan750825
 
Catalogue ONG NUOC PPR DE NHAT .pdf
Catalogue ONG NUOC PPR DE NHAT      .pdfCatalogue ONG NUOC PPR DE NHAT      .pdf
Catalogue ONG NUOC PPR DE NHAT .pdfOrient Homes
 
RE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman LeechRE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman LeechNewman George Leech
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...lizamodels9
 
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / NcrCall Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncrdollysharma2066
 
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...lizamodels9
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst SummitHolger Mueller
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfpollardmorgan
 
Call Girls In Kishangarh Delhi ❤️8860477959 Good Looking Escorts In 24/7 Delh...
Call Girls In Kishangarh Delhi ❤️8860477959 Good Looking Escorts In 24/7 Delh...Call Girls In Kishangarh Delhi ❤️8860477959 Good Looking Escorts In 24/7 Delh...
Call Girls In Kishangarh Delhi ❤️8860477959 Good Looking Escorts In 24/7 Delh...lizamodels9
 

Recently uploaded (20)

Non Text Magic Studio Magic Design for Presentations L&P.pptx
Non Text Magic Studio Magic Design for Presentations L&P.pptxNon Text Magic Studio Magic Design for Presentations L&P.pptx
Non Text Magic Studio Magic Design for Presentations L&P.pptx
 
BEST Call Girls In BELLMONT HOTEL ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In BELLMONT HOTEL ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In BELLMONT HOTEL ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In BELLMONT HOTEL ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detail
 
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
 
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In.../:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
 
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
 
Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Time
 
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Mahipalpur 🔝 Delhi NCR
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
Investment analysis and portfolio management
Investment analysis and portfolio managementInvestment analysis and portfolio management
Investment analysis and portfolio management
 
Catalogue ONG NUOC PPR DE NHAT .pdf
Catalogue ONG NUOC PPR DE NHAT      .pdfCatalogue ONG NUOC PPR DE NHAT      .pdf
Catalogue ONG NUOC PPR DE NHAT .pdf
 
RE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman LeechRE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman Leech
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
 
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / NcrCall Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
 
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst Summit
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
 
Call Girls In Kishangarh Delhi ❤️8860477959 Good Looking Escorts In 24/7 Delh...
Call Girls In Kishangarh Delhi ❤️8860477959 Good Looking Escorts In 24/7 Delh...Call Girls In Kishangarh Delhi ❤️8860477959 Good Looking Escorts In 24/7 Delh...
Call Girls In Kishangarh Delhi ❤️8860477959 Good Looking Escorts In 24/7 Delh...
 

Roman Kyslyi: Використання та побудова LLM агентів (UA)

  • 4.
  • 5. Unsupervised learning: data scraped from the Internet: think clickbait, misinformation, propaganda, conspiracy theories, or attacks against certain demographics. Supervised learning: higher quality data – think StackOverflow, Quora, or human annotations – which makes it somewhat socially acceptable RLHF: polished using RL to make it customer-appropriate
  • 6.
  • 7. Supervised fine tuning (SFT) How to do that? We know that a model mimics its training data. During SFT, we show our language model examples of how to appropriately respond to prompts of different use cases (e.g. question answering, summarization, translation). The examples follow the format (prompt, response) and are called demonstration data. OpenAI calls supervised fine tuning behavior cloning: you demonstrate how the model should behave, and the model clones this behavior. OpenAI’s 40 labelers created around 13,000 (prompt, response) pairs for InstructGPT.
  • 8. Reward model (RM) ● Training data: high-quality data in the format of (prompt, winning_response, losing_response) ● Data scale: 100K - 1M examples ● rθ:: the reward model being trained, parameterized by θ. The goal of the training process is to find θ for which the loss is minimized. ● Training data format: x- prompt; yw - winning response; yl - losing response ● For each training sample: a. reward model’s score for the winning response: sw=rθ(x,yw) b. reward model’s score for the losing response: sl=rθ(x,yl) ● Goal: find θ to minimize the expected loss for all training samples
  • 9. RAG ● Retrieval Augmented Generation ● MuRAG: Multimodal Retrieval-Augmented Generator ● Ensemble of RAG ● HyDE: (Hypothetical Document Embeddings)
  • 10. Chain of Thoughts Chain of Thoughts: https://arxiv.org/abs/2201.11903
  • 11. Compressing information (why gzip will not work)
  • 13. ReAct: Reasoning + Acting with LLMs Source: https://react-lm.github.io/
  • 15. ReAct: HotpotQA example Source: https://react-lm.github.io/
  • 16. Range of AI agents are possible General Data Agents ● Access to more than one tool ● Can accomplish a wider range of tasks Specialized Data Agents ● Similar to retrieval from vector store ● But with access to real- time information Agents that can take action in real world ● Book plane tickets ● Scheduling appointment ● Order doordash ● …
  • 17. Data Agents - LLM-powered workers Email Read latest emails Knowledge Base Retrieve context Analysis Agent Analyze file Slack Send update Data Agent ● Perform automated search and retrieval over different types of data — unstructured, semi- structured, and structured. ● Calling any external service API in a structured fashion. They can either process the response immediately, or index/cache this data for future use.
  • 18. Data Agents - Core Components Agent Reasoning Loop ● ReAct Agent (any LLM) ● OpenAI Agent (only OAI) Tools Query Engine Tools (RAG pipeline) LlamaHub Tools ● Code interpreter ● Slack ● Notion ● Zapier ● … (15+ tools)
  • 19. How to use agents? Use our query engines as “data tools” over your agent: ● Semantic search ● Summarization ● Text-to-SQL ● Document comparisons ● Combining Structured Data w/ Unstructured “Simple” Interface - all agent has to infer is a query string! Example Notebook: ● OpenAI Agent + query engines (as tools) ● Analyzing structured + unstructured data
  • 20. How to handle large responses from tools? LoadAndSearchToolSpec OnDemandLoaderTool
  • 21. How to handle large number of tools? ● Build an index over your tools, and retrieve the most relevant ones to pass to your agent. ● Example Notebook
  • 22. Metrics Source: https://learn.microsoft.com/en-us/azure/machine-learning/prompt- flow/concept-model-monitoring-generative-ai-evaluation-metrics?view=azureml-api- 2 Groundedness: evaluates how well the model's generated answers align with information from the input source. Answers are verified as claims against context in the user-defined ground truth source: even if answers are true (factually correct), if not verifiable against the source text, then it's scored as ungrounded (from 1 to 5). Relevance: measures the extent to which the model's generated responses are pertinent and directly related to the given questions Similarity: quantifies the similarity between a ground truth sentence (or document) and the prediction sentence generated by an AI model. Problem: LLM is used to score each result between 0 and 10. Then the values are normalized.
  • 23. User Input API Tool Tool Output Reasoning Agent Conversation History Fetch History Write History Agent Failure Modes Wrong tool selection/input Rogue Paths
  • 24. User Input API Tool Wrong tool selection/input Hallucination Conversation History Fetch History Write History Failed API Calls Rogue Paths Infinite Loops Agent Failure Modes
  • 25. 25 Testing RAGs for Hallucinations Context Relevance Is the retrieved context relevant to the query? Groundedness Is the response supported by the context? Answer Relevance Is the answer relevant to the query? Query Context Response The RAG Triad
  • 26. 26 Testing Agents for Hallucinations Query Context Response Agent Tool Selection The Agent Quad Context Relevance Groundedness Answer Relevance
  • 27. Blog Post: https://blog.llamaindex.ai/building- better-tools-for-llm-agents-f8c5a6714f11 ● Writing useful tool prompts ● Make tools tolerant of partial/faulty inputs ● Prompt engineering error messages ● Returning the right prompts for “POST” requests ● Don’t overload agent with tools ● Try hierarchical agent modeling Best practices for Agents
  • 28. Question Answer Relevance ● Is the app’s response helpful?
  • 29. Experimenting with data agents ● Data agents give more certainty to eval by testing throughout the application ● Thorough testing of LLM apps ensures groundedness Try yourself: https://colab.research.google.com/drive/12oWmUfrPc1tC_C4ds8LS1sLrB0ikqneH?usp=sharing