SlideShare a Scribd company logo
1 of 30
LLM Agents
Hallucinations
Hallucinations
Types of Hallucinations
Unsupervised learning:
data scraped from the Internet: think
clickbait, misinformation,
propaganda, conspiracy theories, or
attacks against certain
demographics.
Supervised learning: higher quality
data – think StackOverflow, Quora,
or human annotations – which
makes it somewhat socially
acceptable
RLHF: polished using RL to make it
customer-appropriate
Supervised
fine tuning (SFT)
How to do that? We know that a model mimics its training data. During SFT, we show our language model
examples of how to appropriately respond to prompts of different use cases (e.g. question answering,
summarization, translation). The examples follow the format (prompt, response) and are called demonstration
data. OpenAI calls supervised fine tuning behavior cloning: you demonstrate how the model should behave,
and the model clones this behavior.
OpenAI’s 40 labelers created around 13,000 (prompt, response) pairs for InstructGPT.
Reward model (RM)
● Training data: high-quality data in the format of (prompt, winning_response, losing_response)
● Data scale: 100K - 1M examples
● rθ:: the reward model being trained, parameterized by θ. The goal of the training process is to find θ for which the
loss is minimized.
● Training data format: x- prompt; yw - winning response; yl - losing response
● For each training sample:
a. reward model’s score for the winning response: sw=rθ(x,yw)
b. reward model’s score for the losing response: sl=rθ(x,yl)
● Goal: find θ to minimize the expected loss for all training samples
RAG
● Retrieval Augmented Generation
● MuRAG: Multimodal Retrieval-Augmented Generator
● Ensemble of RAG
● HyDE: (Hypothetical Document Embeddings)
Chain of Thoughts
Chain of Thoughts: https://arxiv.org/abs/2201.11903
Compressing information (why gzip will not work)
LLM
RDBMS
Data Lake
Keyword search
Vector Search
Prompt
ReAct: Reasoning + Acting with LLMs
Source: https://react-lm.github.io/
ChatGPT plugins
ReAct: HotpotQA example
Source: https://react-lm.github.io/
Range of AI agents are possible
General Data Agents
● Access to more than
one tool
● Can accomplish a wider
range of tasks
Specialized Data Agents
● Similar to retrieval from
vector store
● But with access to real-
time information
Agents that can take
action in real world
● Book plane tickets
● Scheduling appointment
● Order doordash
● …
Data Agents - LLM-powered workers
Email
Read latest
emails
Knowledge
Base
Retrieve
context
Analysis
Agent
Analyze
file
Slack
Send
update
Data
Agent
● Perform automated search and
retrieval over different types of
data — unstructured, semi-
structured, and structured.
● Calling any external service API in
a structured fashion. They can
either process the response
immediately, or index/cache this
data for future use.
Data Agents - Core Components
Agent Reasoning Loop
● ReAct Agent (any LLM)
● OpenAI Agent (only OAI)
Tools
Query Engine Tools (RAG
pipeline)
LlamaHub Tools
● Code interpreter
● Slack
● Notion
● Zapier
● … (15+ tools)
How to use agents?
Use our query engines as “data tools” over your
agent:
● Semantic search
● Summarization
● Text-to-SQL
● Document comparisons
● Combining Structured Data w/
Unstructured
“Simple” Interface - all agent has to infer is a
query string!
Example Notebook:
● OpenAI Agent + query engines (as tools)
● Analyzing structured + unstructured data
How to handle large responses from tools?
LoadAndSearchToolSpec
OnDemandLoaderTool
How to handle large number of tools?
● Build an index over your tools, and retrieve the most relevant ones to pass to your agent.
● Example Notebook
Metrics
Source: https://learn.microsoft.com/en-us/azure/machine-learning/prompt-
flow/concept-model-monitoring-generative-ai-evaluation-metrics?view=azureml-api-
2
Groundedness: evaluates how well the model's generated answers align with information from the input source.
Answers are verified as claims against context in the user-defined ground truth source: even if answers are true (factually correct),
if not verifiable against the source text, then it's scored as ungrounded (from 1 to 5).
Relevance: measures the extent to which the model's generated responses are pertinent and directly related to the given
questions
Similarity: quantifies the similarity between a ground truth sentence (or document) and the prediction sentence generated by
an AI model.
Problem: LLM is used to score each result between 0 and 10. Then the values are normalized.
User Input
API Tool
Tool Output
Reasoning
Agent
Conversation
History
Fetch History
Write History
Agent Failure Modes
Wrong tool
selection/input
Rogue Paths
User Input
API Tool
Wrong tool
selection/input
Hallucination Conversation
History
Fetch History
Write History
Failed API
Calls
Rogue Paths
Infinite Loops
Agent Failure Modes
25
Testing RAGs for Hallucinations
Context Relevance
Is the retrieved context
relevant to the query?
Groundedness
Is the response supported
by the context?
Answer Relevance
Is the answer relevant to
the query?
Query
Context
Response
The RAG Triad
26
Testing Agents for Hallucinations
Query
Context
Response
Agent
Tool Selection
The Agent Quad
Context Relevance
Groundedness
Answer Relevance
Blog Post: https://blog.llamaindex.ai/building-
better-tools-for-llm-agents-f8c5a6714f11
● Writing useful tool prompts
● Make tools tolerant of partial/faulty inputs
● Prompt engineering error messages
● Returning the right prompts for “POST”
requests
● Don’t overload agent with tools
● Try hierarchical agent modeling
Best practices for Agents
Question Answer Relevance
● Is the app’s response helpful?
Experimenting with data agents
● Data agents give more certainty to eval by testing throughout the application
● Thorough testing of LLM apps ensures groundedness
Try yourself:
https://colab.research.google.com/drive/12oWmUfrPc1tC_C4ds8LS1sLrB0ikqneH?usp=sharing
Example

More Related Content

Similar to Roman Kyslyi: Використання та побудова LLM агентів (UA)

machinecanthink-160226155704.pdf
machinecanthink-160226155704.pdfmachinecanthink-160226155704.pdf
machinecanthink-160226155704.pdf
PranavPatil822557
 
Qualitative Content Analysis
Qualitative Content AnalysisQualitative Content Analysis
Qualitative Content Analysis
Ricky Bilakhia
 
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorial
Thengo Kim
 

Similar to Roman Kyslyi: Використання та побудова LLM агентів (UA) (20)

Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
 
Security of LLM APIs by Ankita Gupta, Akto.io
Security of LLM APIs by Ankita Gupta, Akto.ioSecurity of LLM APIs by Ankita Gupta, Akto.io
Security of LLM APIs by Ankita Gupta, Akto.io
 
AI Tools for Productivity: Exploring Prompt Engineering and Key Features
AI Tools for Productivity: Exploring Prompt Engineering and Key FeaturesAI Tools for Productivity: Exploring Prompt Engineering and Key Features
AI Tools for Productivity: Exploring Prompt Engineering and Key Features
 
TechDayPakistan-Slides RAG with Cosmos DB.pptx
TechDayPakistan-Slides RAG with Cosmos DB.pptxTechDayPakistan-Slides RAG with Cosmos DB.pptx
TechDayPakistan-Slides RAG with Cosmos DB.pptx
 
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
 
LLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
 
Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
machinecanthink-160226155704.pdf
machinecanthink-160226155704.pdfmachinecanthink-160226155704.pdf
machinecanthink-160226155704.pdf
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Qualitative Content Analysis
Qualitative Content AnalysisQualitative Content Analysis
Qualitative Content Analysis
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Generative AI in CSharp with Semantic Kernel.pptx
Generative AI in CSharp with Semantic Kernel.pptxGenerative AI in CSharp with Semantic Kernel.pptx
Generative AI in CSharp with Semantic Kernel.pptx
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 
Recent Trends in Semantic Search Technologies
Recent Trends in Semantic Search TechnologiesRecent Trends in Semantic Search Technologies
Recent Trends in Semantic Search Technologies
 
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorial
 

More from Lviv Startup Club

More from Lviv Startup Club (20)

Roman Humeniuk: Формула ненасильницької комунікації та інші техніки для якісн...
Roman Humeniuk: Формула ненасильницької комунікації та інші техніки для якісн...Roman Humeniuk: Формула ненасильницької комунікації та інші техніки для якісн...
Roman Humeniuk: Формула ненасильницької комунікації та інші техніки для якісн...
 
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
 
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
 
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
 
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
 
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
 
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
 
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
 
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
 
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
 
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
 
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
 
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
 
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
 
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
 
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
 
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
 
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
 

Recently uploaded

Constitution of Company Article of Association
Constitution of Company Article of AssociationConstitution of Company Article of Association
Constitution of Company Article of Association
seri bangash
 
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot ReportFuture of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Dubai Multi Commodity Centre
 
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
daisycvs
 
What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...
srcw2322l101
 
Presentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelledPresentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelled
CaitlinCummins3
 
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTARPEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
doktercalysta
 
ابو ظبي اعلان | - سايتوتك في الامارات حبوب الاجهاض للبيع ف حبوب الإجهاض ... ا...
ابو ظبي اعلان | - سايتوتك في الامارات حبوب الاجهاض للبيع ف حبوب الإجهاض ... ا...ابو ظبي اعلان | - سايتوتك في الامارات حبوب الاجهاض للبيع ف حبوب الإجهاض ... ا...
ابو ظبي اعلان | - سايتوتك في الامارات حبوب الاجهاض للبيع ف حبوب الإجهاض ... ا...
brennadilys816
 

Recently uploaded (20)

Constitution of Company Article of Association
Constitution of Company Article of AssociationConstitution of Company Article of Association
Constitution of Company Article of Association
 
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
 
The Risks of Ignoring Bookkeeping in Your Business
The Risks of Ignoring Bookkeeping in Your BusinessThe Risks of Ignoring Bookkeeping in Your Business
The Risks of Ignoring Bookkeeping in Your Business
 
Inside the Black Box of Venture Capital (VC)
Inside the Black Box of Venture Capital (VC)Inside the Black Box of Venture Capital (VC)
Inside the Black Box of Venture Capital (VC)
 
1Q24_EN hyundai capital 1q performance
1Q24_EN   hyundai capital 1q performance1Q24_EN   hyundai capital 1q performance
1Q24_EN hyundai capital 1q performance
 
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdfProgress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
 
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot ReportFuture of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
Future of Trade 2024 - Decoupled and Reconfigured - Snapshot Report
 
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
 
PitchBook’s Guide to VC Funding for Startups
PitchBook’s Guide to VC Funding for StartupsPitchBook’s Guide to VC Funding for Startups
PitchBook’s Guide to VC Funding for Startups
 
The Truth About Dinesh Bafna's Situation.pdf
The Truth About Dinesh Bafna's Situation.pdfThe Truth About Dinesh Bafna's Situation.pdf
The Truth About Dinesh Bafna's Situation.pdf
 
بروفايل شركة ميار الخليج للاستشارات الهندسية.pdf
بروفايل شركة ميار الخليج للاستشارات الهندسية.pdfبروفايل شركة ميار الخليج للاستشارات الهندسية.pdf
بروفايل شركة ميار الخليج للاستشارات الهندسية.pdf
 
What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...
 
HAL Financial Performance Analysis and Future Prospects
HAL Financial Performance Analysis and Future ProspectsHAL Financial Performance Analysis and Future Prospects
HAL Financial Performance Analysis and Future Prospects
 
Copyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to KnowCopyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to Know
 
Elevate Your Online Presence with SEO Services
Elevate Your Online Presence with SEO ServicesElevate Your Online Presence with SEO Services
Elevate Your Online Presence with SEO Services
 
Innomantra Viewpoint - Building Moonshots : May-Jun 2024.pdf
Innomantra Viewpoint - Building Moonshots : May-Jun 2024.pdfInnomantra Viewpoint - Building Moonshots : May-Jun 2024.pdf
Innomantra Viewpoint - Building Moonshots : May-Jun 2024.pdf
 
Presentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelledPresentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelled
 
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTARPEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
 
Toyota Kata Coaching for Agile Teams & Transformations
Toyota Kata Coaching for Agile Teams & TransformationsToyota Kata Coaching for Agile Teams & Transformations
Toyota Kata Coaching for Agile Teams & Transformations
 
ابو ظبي اعلان | - سايتوتك في الامارات حبوب الاجهاض للبيع ف حبوب الإجهاض ... ا...
ابو ظبي اعلان | - سايتوتك في الامارات حبوب الاجهاض للبيع ف حبوب الإجهاض ... ا...ابو ظبي اعلان | - سايتوتك في الامارات حبوب الاجهاض للبيع ف حبوب الإجهاض ... ا...
ابو ظبي اعلان | - سايتوتك في الامارات حبوب الاجهاض للبيع ف حبوب الإجهاض ... ا...
 

Roman Kyslyi: Використання та побудова LLM агентів (UA)

  • 4.
  • 5. Unsupervised learning: data scraped from the Internet: think clickbait, misinformation, propaganda, conspiracy theories, or attacks against certain demographics. Supervised learning: higher quality data – think StackOverflow, Quora, or human annotations – which makes it somewhat socially acceptable RLHF: polished using RL to make it customer-appropriate
  • 6.
  • 7. Supervised fine tuning (SFT) How to do that? We know that a model mimics its training data. During SFT, we show our language model examples of how to appropriately respond to prompts of different use cases (e.g. question answering, summarization, translation). The examples follow the format (prompt, response) and are called demonstration data. OpenAI calls supervised fine tuning behavior cloning: you demonstrate how the model should behave, and the model clones this behavior. OpenAI’s 40 labelers created around 13,000 (prompt, response) pairs for InstructGPT.
  • 8. Reward model (RM) ● Training data: high-quality data in the format of (prompt, winning_response, losing_response) ● Data scale: 100K - 1M examples ● rθ:: the reward model being trained, parameterized by θ. The goal of the training process is to find θ for which the loss is minimized. ● Training data format: x- prompt; yw - winning response; yl - losing response ● For each training sample: a. reward model’s score for the winning response: sw=rθ(x,yw) b. reward model’s score for the losing response: sl=rθ(x,yl) ● Goal: find θ to minimize the expected loss for all training samples
  • 9. RAG ● Retrieval Augmented Generation ● MuRAG: Multimodal Retrieval-Augmented Generator ● Ensemble of RAG ● HyDE: (Hypothetical Document Embeddings)
  • 10. Chain of Thoughts Chain of Thoughts: https://arxiv.org/abs/2201.11903
  • 11. Compressing information (why gzip will not work)
  • 13. ReAct: Reasoning + Acting with LLMs Source: https://react-lm.github.io/
  • 15. ReAct: HotpotQA example Source: https://react-lm.github.io/
  • 16. Range of AI agents are possible General Data Agents ● Access to more than one tool ● Can accomplish a wider range of tasks Specialized Data Agents ● Similar to retrieval from vector store ● But with access to real- time information Agents that can take action in real world ● Book plane tickets ● Scheduling appointment ● Order doordash ● …
  • 17. Data Agents - LLM-powered workers Email Read latest emails Knowledge Base Retrieve context Analysis Agent Analyze file Slack Send update Data Agent ● Perform automated search and retrieval over different types of data — unstructured, semi- structured, and structured. ● Calling any external service API in a structured fashion. They can either process the response immediately, or index/cache this data for future use.
  • 18. Data Agents - Core Components Agent Reasoning Loop ● ReAct Agent (any LLM) ● OpenAI Agent (only OAI) Tools Query Engine Tools (RAG pipeline) LlamaHub Tools ● Code interpreter ● Slack ● Notion ● Zapier ● … (15+ tools)
  • 19. How to use agents? Use our query engines as “data tools” over your agent: ● Semantic search ● Summarization ● Text-to-SQL ● Document comparisons ● Combining Structured Data w/ Unstructured “Simple” Interface - all agent has to infer is a query string! Example Notebook: ● OpenAI Agent + query engines (as tools) ● Analyzing structured + unstructured data
  • 20. How to handle large responses from tools? LoadAndSearchToolSpec OnDemandLoaderTool
  • 21. How to handle large number of tools? ● Build an index over your tools, and retrieve the most relevant ones to pass to your agent. ● Example Notebook
  • 22. Metrics Source: https://learn.microsoft.com/en-us/azure/machine-learning/prompt- flow/concept-model-monitoring-generative-ai-evaluation-metrics?view=azureml-api- 2 Groundedness: evaluates how well the model's generated answers align with information from the input source. Answers are verified as claims against context in the user-defined ground truth source: even if answers are true (factually correct), if not verifiable against the source text, then it's scored as ungrounded (from 1 to 5). Relevance: measures the extent to which the model's generated responses are pertinent and directly related to the given questions Similarity: quantifies the similarity between a ground truth sentence (or document) and the prediction sentence generated by an AI model. Problem: LLM is used to score each result between 0 and 10. Then the values are normalized.
  • 23. User Input API Tool Tool Output Reasoning Agent Conversation History Fetch History Write History Agent Failure Modes Wrong tool selection/input Rogue Paths
  • 24. User Input API Tool Wrong tool selection/input Hallucination Conversation History Fetch History Write History Failed API Calls Rogue Paths Infinite Loops Agent Failure Modes
  • 25. 25 Testing RAGs for Hallucinations Context Relevance Is the retrieved context relevant to the query? Groundedness Is the response supported by the context? Answer Relevance Is the answer relevant to the query? Query Context Response The RAG Triad
  • 26. 26 Testing Agents for Hallucinations Query Context Response Agent Tool Selection The Agent Quad Context Relevance Groundedness Answer Relevance
  • 27. Blog Post: https://blog.llamaindex.ai/building- better-tools-for-llm-agents-f8c5a6714f11 ● Writing useful tool prompts ● Make tools tolerant of partial/faulty inputs ● Prompt engineering error messages ● Returning the right prompts for “POST” requests ● Don’t overload agent with tools ● Try hierarchical agent modeling Best practices for Agents
  • 28. Question Answer Relevance ● Is the app’s response helpful?
  • 29. Experimenting with data agents ● Data agents give more certainty to eval by testing throughout the application ● Thorough testing of LLM apps ensures groundedness Try yourself: https://colab.research.google.com/drive/12oWmUfrPc1tC_C4ds8LS1sLrB0ikqneH?usp=sharing