SlideShare a Scribd company logo
Shuheng You
06/05/2024
Hallucination of LLMs
Paper Discussion
Background
Hallucination
“Generated content that appears factual but is ungrounded”
• We want to look at the possible underlying mechanism leading to the problem
2
Background
Heuristic Solutions
Chain-of-Veri
fi
cation:
use LLMs to generate
veri
fi
cation questions
3
Chain-of-Veri
fi
cation Reduces Hallucination in Large Language Models. https://arxiv.org/abs/2309.11495
Background
LMvLM:
use another LLM to interact to
fi
nd
inconsistencies
4
Heuristic Solutions
LM vs LM: Detecting Factual Errors via Cross Examination.
https://aclanthology.org/2023.emnlp-main.778/
Do LLMs Know What They Know?
‣ P(True): the probability a model assigns to if a speci
fi
c sample is the correct
answer to a question
Ask an LLM whether its own answer to a question is correct (few-shot)
5
Introduction of P(True)
Language Models (Mostly) Know What They Know. https://arxiv.org/abs/2207.05221
Do LLMs Know What They Know?
‣ Models can self-evaluate their own samples with reasonable accuracy
6
Experiment on P(True)
Do LLMs Know What They Know?
‣ P(IK): the probability a model assigns to if "I know"
i.e. whether it will be able to answer a given question correctly
‣ Input: question itself
‣ Output: the probability
through an additional binary classi
fi
cation head on top of the model
7
Introduction of P(IK)
Do LLMs Know What They Know?
P(IK) regarding the president of Absurdistan << P(IK) regarding the US
8
Visualization of P(IK)
Do LLMs Know What They Know?
We care about both in-distribution and out-of-bound performance of P(IK)
• In-distribution performance measures how much reliable is P(IK) trained within
a given task
• Out-of-bound performance measures the generalization ability of a trained
P(IK) on a new task
9
Experiment on P(IK)
Do LLMs Know What They Know?
Ground truth P(IK): the actual correct samples/total generated samples
10
Experiment on P(IK)
Residual Streams Across Layers
Analysis of all L hidden states and the tokens that can be predicted from them
Given di
ff
erent prompts (some succeed some fail to predict the correct answer)
11
Residual Streams
On Large Language Models' Hallucination with Regard to Known Facts. https://arxiv.org/abs/2403.20009
Decoder Layer
Hidden State
L *
Residual Streams Across Layers
Success token:
the activation of the
correct token when
given the optimal prompt
Failed token:
the activation of the
correct token when
given failed prompts
Hallucinated token:
the activation of the
incorrect token
12
Dynamics of Residual Streams
Residual Streams Across Layers
The dynamic of the correct token in a model
Accuracy of a trained SVM classi
fi
er on the plot:
13
Use the Pattern as a Classi
fi
er
Issues and Discussion
Issues:
• Methods are more e
ff
ective to short questions (especially single token), and
often fail when given longer ones
• Only available for open source LLMs
Discussion:
• Do you think these methods are practical in production scenarios?
• If not, what do you think are the drawbacks and potential problems?
14
From the Two Papers

More Related Content

Similar to 社内勉強会資料_Hallucination of LLMs               .

M.cheraghi Krashen-monitor model -BICS and CALP
M.cheraghi Krashen-monitor model -BICS and CALPM.cheraghi Krashen-monitor model -BICS and CALP
M.cheraghi Krashen-monitor model -BICS and CALP
maryam cheraghi shehni
 
Tutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsTutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and Systems
Adrian Paschke
 
Towards Universal Language Understanding
Towards Universal Language UnderstandingTowards Universal Language Understanding
Towards Universal Language Understanding
Yunyao Li
 
The Last Line Effect
The Last Line EffectThe Last Line Effect
The Last Line Effect
Andrey Karpov
 
240627_kcc_XAIworkshop-counterfactual reasoning
240627_kcc_XAIworkshop-counterfactual reasoning240627_kcc_XAIworkshop-counterfactual reasoning
240627_kcc_XAIworkshop-counterfactual reasoning
Cheoneum Park
 
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
L. Thorne McCarty
 
Practical functional programming in JavaScript for the non-mathematician
Practical functional programming in JavaScript for the non-mathematicianPractical functional programming in JavaScript for the non-mathematician
Practical functional programming in JavaScript for the non-mathematician
Ian Thomas
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
Chia-Wen Cheng
 
RuleML2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML2015:  Semantics of Notation3 Logic: A Solution for Implicit Quantifica...RuleML2015:  Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML
 
What is knowledge representation and reasoning ?
What is knowledge representation and reasoning ?What is knowledge representation and reasoning ?
What is knowledge representation and reasoning ?
Anant Soft Computing
 
Machine Learning of Natural Language
Machine Learning of Natural LanguageMachine Learning of Natural Language
Machine Learning of Natural Language
butest
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
milkesa13
 
AI3391 Artificial Intelligence Session 25 Horn clause.pptx
AI3391 Artificial Intelligence Session 25 Horn clause.pptxAI3391 Artificial Intelligence Session 25 Horn clause.pptx
AI3391 Artificial Intelligence Session 25 Horn clause.pptx
Asst.prof M.Gokilavani
 
FPMW15 15ème French PhilMath Workshop.pptx
FPMW15 15ème French PhilMath Workshop.pptxFPMW15 15ème French PhilMath Workshop.pptx
FPMW15 15ème French PhilMath Workshop.pptx
BrendanLarvor1
 
The concept of proof: how much trouble are we in?
The concept of proof: how much trouble are we in?The concept of proof: how much trouble are we in?
The concept of proof: how much trouble are we in?
Brendan Larvor
 
Analyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et WekaAnalyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et Weka
Patrice Bellot - Aix-Marseille Université / CNRS (LIS, INS2I)
 
Cross-Lingual Sentiment Analysis using modified BRAE
Cross-Lingual Sentiment Analysis using modified BRAECross-Lingual Sentiment Analysis using modified BRAE
Cross-Lingual Sentiment Analysis using modified BRAE
marujirou
 
Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...
baoilleach
 
ULM1 - The borders of Ambiguity
ULM1 - The borders of AmbiguityULM1 - The borders of Ambiguity
ULM1 - The borders of Ambiguity
Rubén Izquierdo Beviá
 
BayFP: Concurrent and Multicore Haskell
BayFP: Concurrent and Multicore HaskellBayFP: Concurrent and Multicore Haskell
BayFP: Concurrent and Multicore Haskell
Bryan O'Sullivan
 

Similar to 社内勉強会資料_Hallucination of LLMs               . (20)

M.cheraghi Krashen-monitor model -BICS and CALP
M.cheraghi Krashen-monitor model -BICS and CALPM.cheraghi Krashen-monitor model -BICS and CALP
M.cheraghi Krashen-monitor model -BICS and CALP
 
Tutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsTutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and Systems
 
Towards Universal Language Understanding
Towards Universal Language UnderstandingTowards Universal Language Understanding
Towards Universal Language Understanding
 
The Last Line Effect
The Last Line EffectThe Last Line Effect
The Last Line Effect
 
240627_kcc_XAIworkshop-counterfactual reasoning
240627_kcc_XAIworkshop-counterfactual reasoning240627_kcc_XAIworkshop-counterfactual reasoning
240627_kcc_XAIworkshop-counterfactual reasoning
 
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
 
Practical functional programming in JavaScript for the non-mathematician
Practical functional programming in JavaScript for the non-mathematicianPractical functional programming in JavaScript for the non-mathematician
Practical functional programming in JavaScript for the non-mathematician
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
RuleML2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML2015:  Semantics of Notation3 Logic: A Solution for Implicit Quantifica...RuleML2015:  Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
 
What is knowledge representation and reasoning ?
What is knowledge representation and reasoning ?What is knowledge representation and reasoning ?
What is knowledge representation and reasoning ?
 
Machine Learning of Natural Language
Machine Learning of Natural LanguageMachine Learning of Natural Language
Machine Learning of Natural Language
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
AI3391 Artificial Intelligence Session 25 Horn clause.pptx
AI3391 Artificial Intelligence Session 25 Horn clause.pptxAI3391 Artificial Intelligence Session 25 Horn clause.pptx
AI3391 Artificial Intelligence Session 25 Horn clause.pptx
 
FPMW15 15ème French PhilMath Workshop.pptx
FPMW15 15ème French PhilMath Workshop.pptxFPMW15 15ème French PhilMath Workshop.pptx
FPMW15 15ème French PhilMath Workshop.pptx
 
The concept of proof: how much trouble are we in?
The concept of proof: how much trouble are we in?The concept of proof: how much trouble are we in?
The concept of proof: how much trouble are we in?
 
Analyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et WekaAnalyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et Weka
 
Cross-Lingual Sentiment Analysis using modified BRAE
Cross-Lingual Sentiment Analysis using modified BRAECross-Lingual Sentiment Analysis using modified BRAE
Cross-Lingual Sentiment Analysis using modified BRAE
 
Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...
 
ULM1 - The borders of Ambiguity
ULM1 - The borders of AmbiguityULM1 - The borders of Ambiguity
ULM1 - The borders of Ambiguity
 
BayFP: Concurrent and Multicore Haskell
BayFP: Concurrent and Multicore HaskellBayFP: Concurrent and Multicore Haskell
BayFP: Concurrent and Multicore Haskell
 

More from NABLAS株式会社

社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
NABLAS株式会社
 
社内勉強会資料_XTTS: a Massively Multilingual ZeroShot Text-to-Speech Model.pdf
社内勉強会資料_XTTS: a Massively Multilingual ZeroShot Text-to-Speech Model.pdf社内勉強会資料_XTTS: a Massively Multilingual ZeroShot Text-to-Speech Model.pdf
社内勉強会資料_XTTS: a Massively Multilingual ZeroShot Text-to-Speech Model.pdf
NABLAS株式会社
 
社内勉強会資料_Two Papers Contribute to Faster Python.pdf
社内勉強会資料_Two Papers Contribute to Faster Python.pdf社内勉強会資料_Two Papers Contribute to Faster Python.pdf
社内勉強会資料_Two Papers Contribute to Faster Python.pdf
NABLAS株式会社
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
【NABLAS Inc.】Recruitment materials - Ver. 2024
【NABLAS Inc.】Recruitment materials - Ver. 2024【NABLAS Inc.】Recruitment materials - Ver. 2024
【NABLAS Inc.】Recruitment materials - Ver. 2024
NABLAS株式会社
 
【NABLAS株式会社】採用ピッチ資料 Ver. 2024           .
【NABLAS株式会社】採用ピッチ資料 Ver. 2024           .【NABLAS株式会社】採用ピッチ資料 Ver. 2024           .
【NABLAS株式会社】採用ピッチ資料 Ver. 2024           .
NABLAS株式会社
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
NABLAS株式会社
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
NABLAS株式会社
 

More from NABLAS株式会社 (9)

社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
 
社内勉強会資料_XTTS: a Massively Multilingual ZeroShot Text-to-Speech Model.pdf
社内勉強会資料_XTTS: a Massively Multilingual ZeroShot Text-to-Speech Model.pdf社内勉強会資料_XTTS: a Massively Multilingual ZeroShot Text-to-Speech Model.pdf
社内勉強会資料_XTTS: a Massively Multilingual ZeroShot Text-to-Speech Model.pdf
 
社内勉強会資料_Two Papers Contribute to Faster Python.pdf
社内勉強会資料_Two Papers Contribute to Faster Python.pdf社内勉強会資料_Two Papers Contribute to Faster Python.pdf
社内勉強会資料_Two Papers Contribute to Faster Python.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
【NABLAS Inc.】Recruitment materials - Ver. 2024
【NABLAS Inc.】Recruitment materials - Ver. 2024【NABLAS Inc.】Recruitment materials - Ver. 2024
【NABLAS Inc.】Recruitment materials - Ver. 2024
 
【NABLAS株式会社】採用ピッチ資料 Ver. 2024           .
【NABLAS株式会社】採用ピッチ資料 Ver. 2024           .【NABLAS株式会社】採用ピッチ資料 Ver. 2024           .
【NABLAS株式会社】採用ピッチ資料 Ver. 2024           .
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 

Recently uploaded

Willis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdfWillis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdf
LINAT
 
bai-tap-tieng-anh-lop-12-unit-4-the-mass-media (1).doc
bai-tap-tieng-anh-lop-12-unit-4-the-mass-media (1).docbai-tap-tieng-anh-lop-12-unit-4-the-mass-media (1).doc
bai-tap-tieng-anh-lop-12-unit-4-the-mass-media (1).doc
PhngThLmHnh
 
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
tanupasswan6
 
Supervised Learning (Data Science).pptx
Supervised Learning  (Data Science).pptxSupervised Learning  (Data Science).pptx
Supervised Learning (Data Science).pptx
TARIKU ENDALE
 
ch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ssch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ss
MinThetLwin1
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
AnujaGaikwad28
 
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
revolutionary575
 
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
44annissa
 
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
saadkhan1485265
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
huseindihon
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
erynsouthern
 
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
tanupasswan6
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
sharonblush
 
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
birajmohan012
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
fatima shekh$A17
 
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
6459astrid
 
Potential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriatePotential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriate
huseindihon
 
M44.pdf dairy management farm report of an
M44.pdf dairy management farm report of anM44.pdf dairy management farm report of an
M44.pdf dairy management farm report of an
ManjuBv2
 
all about the data science process, covering the steps present in almost ever...
all about the data science process, covering the steps present in almost ever...all about the data science process, covering the steps present in almost ever...
all about the data science process, covering the steps present in almost ever...
palaniappancse
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
kihus38
 

Recently uploaded (20)

Willis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdfWillis Tower //Sears Tower- Supertall Building .pdf
Willis Tower //Sears Tower- Supertall Building .pdf
 
bai-tap-tieng-anh-lop-12-unit-4-the-mass-media (1).doc
bai-tap-tieng-anh-lop-12-unit-4-the-mass-media (1).docbai-tap-tieng-anh-lop-12-unit-4-the-mass-media (1).doc
bai-tap-tieng-anh-lop-12-unit-4-the-mass-media (1).doc
 
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
 
Supervised Learning (Data Science).pptx
Supervised Learning  (Data Science).pptxSupervised Learning  (Data Science).pptx
Supervised Learning (Data Science).pptx
 
ch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ssch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ss
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
 
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
 
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
VIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 in...
 
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
 
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
 
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
 
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
 
Potential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriatePotential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriate
 
M44.pdf dairy management farm report of an
M44.pdf dairy management farm report of anM44.pdf dairy management farm report of an
M44.pdf dairy management farm report of an
 
all about the data science process, covering the steps present in almost ever...
all about the data science process, covering the steps present in almost ever...all about the data science process, covering the steps present in almost ever...
all about the data science process, covering the steps present in almost ever...
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
 

社内勉強会資料_Hallucination of LLMs               .

  • 2. Background Hallucination “Generated content that appears factual but is ungrounded” • We want to look at the possible underlying mechanism leading to the problem 2
  • 3. Background Heuristic Solutions Chain-of-Veri fi cation: use LLMs to generate veri fi cation questions 3 Chain-of-Veri fi cation Reduces Hallucination in Large Language Models. https://arxiv.org/abs/2309.11495
  • 4. Background LMvLM: use another LLM to interact to fi nd inconsistencies 4 Heuristic Solutions LM vs LM: Detecting Factual Errors via Cross Examination. https://aclanthology.org/2023.emnlp-main.778/
  • 5. Do LLMs Know What They Know? ‣ P(True): the probability a model assigns to if a speci fi c sample is the correct answer to a question Ask an LLM whether its own answer to a question is correct (few-shot) 5 Introduction of P(True) Language Models (Mostly) Know What They Know. https://arxiv.org/abs/2207.05221
  • 6. Do LLMs Know What They Know? ‣ Models can self-evaluate their own samples with reasonable accuracy 6 Experiment on P(True)
  • 7. Do LLMs Know What They Know? ‣ P(IK): the probability a model assigns to if "I know" i.e. whether it will be able to answer a given question correctly ‣ Input: question itself ‣ Output: the probability through an additional binary classi fi cation head on top of the model 7 Introduction of P(IK)
  • 8. Do LLMs Know What They Know? P(IK) regarding the president of Absurdistan << P(IK) regarding the US 8 Visualization of P(IK)
  • 9. Do LLMs Know What They Know? We care about both in-distribution and out-of-bound performance of P(IK) • In-distribution performance measures how much reliable is P(IK) trained within a given task • Out-of-bound performance measures the generalization ability of a trained P(IK) on a new task 9 Experiment on P(IK)
  • 10. Do LLMs Know What They Know? Ground truth P(IK): the actual correct samples/total generated samples 10 Experiment on P(IK)
  • 11. Residual Streams Across Layers Analysis of all L hidden states and the tokens that can be predicted from them Given di ff erent prompts (some succeed some fail to predict the correct answer) 11 Residual Streams On Large Language Models' Hallucination with Regard to Known Facts. https://arxiv.org/abs/2403.20009 Decoder Layer Hidden State L *
  • 12. Residual Streams Across Layers Success token: the activation of the correct token when given the optimal prompt Failed token: the activation of the correct token when given failed prompts Hallucinated token: the activation of the incorrect token 12 Dynamics of Residual Streams
  • 13. Residual Streams Across Layers The dynamic of the correct token in a model Accuracy of a trained SVM classi fi er on the plot: 13 Use the Pattern as a Classi fi er
  • 14. Issues and Discussion Issues: • Methods are more e ff ective to short questions (especially single token), and often fail when given longer ones • Only available for open source LLMs Discussion: • Do you think these methods are practical in production scenarios? • If not, what do you think are the drawbacks and potential problems? 14 From the Two Papers