SlideShare a Scribd company logo
1 of 51
Download to read offline
Active Retrieval Augmented Generation
Zhengbao Jiang, Frank F. Xu, Luyu Gao et al.
Speaker: Po-Chuan Chen
Jun 27, 2023
1 / 51
Table of contents
1 Introduction
2 Retrieval-Augmented Generation
3 Forward-Looking Active REtrieval Augmented Generation
4 Multi-time Retrieval Baselines
5 Experimental Setup / Results
6 Conclusion / Limitation
7 Reflection
2 / 51
Most existing retrieval-augmented LMs employ a
retrieve-and-generate setup that only retrieves information once based
on the input.
This is limiting, however, in more general scenarios involving
generation of long texts, where continually gathering information
throughout the generation process is essential.
3 / 51
This paper propose Forward-Looking Active REtrieval augmented
generation (FLARE), a generic retrieval-augmented generation
method which iteratively uses a prediction of the upcoming sentence
to anticipate future content, which is then utilized as a query to
retrieve relevant documents to regenerate the sentence if it
contains low-confidence tokens.
4 / 51
Table of contents
1 Introduction
2 Retrieval-Augmented Generation
3 Forward-Looking Active REtrieval Augmented Generation
4 Multi-time Retrieval Baselines
5 Experimental Setup / Results
6 Conclusion / Limitation
7 Reflection
5 / 51
Generative language models (LMs) still tend to hallucinate and create
imaginary content.
To address the issue of hallucination, one promising direction is to
augment generation with retrieval, which involves augmenting
parametric LMs with non-parametric retrieval components that can
look up relevant information from external knowledge resources such
as document corpora [5].
6 / 51
But there has some problems, that initial retrieval based on the topic
name (e.g., Joe Biden) may not cover all aspects and details.
Therefore, it is crucial to retrieve extra information as needed
during the generation process, such as when generating a certain
aspect (e.g., the education history of Joe Biden) or a specific detail
(e.g., when did Joe Biden announce his candidacy for the 2020
presidential campaign).
7 / 51
Figure 1: FLARE: Starting with the user input x and initial retrieval results
Dx. Low-probability tokens (indicated with underline)
8 / 51
Forward-Looking Active REtrieval augmented generation (FLARE),
as illustrated in Figure 1. Iteratively generates a temporary next
sentence, use it as the query to retrieve relevant documents if it
contains low-probability tokens and regenerate the next sentence until
reaches the end.
FLARE is applicable to any existing LMs at inference time without
additional training. Since GPT-3.5 [6] announced, they examine the
effectiveness of their methods on text-davinci-003.
9 / 51
Retrieval-Augmented Generation
Table of contents
1 Introduction
2 Retrieval-Augmented Generation
3 Forward-Looking Active REtrieval Augmented Generation
4 Multi-time Retrieval Baselines
5 Experimental Setup / Results
6 Conclusion / Limitation
7 Reflection
10 / 51
Retrieval-Augmented Generation
Notations and Definitions
Given a user input x and a document corpus D = {di}|D|
i=1 , the goal of
retrieval-augmented LMs is to generate the answer
y = [s1, s2, . . . , sm] = [w1, w2, . . . , wn] containing m sentences or n
tokens leveraging information retrieved from the corpus.
A retriever that can retrieve a list of documents Dq = ret(q) for a
query q.
Their method following existing methods [8, 11] to prepend the
retrieved documents before the user input to aid future generation for
both baselines and FLARE for fair comparisons: y = LM([Dq, x]),
where [·, ·] is concatenation following the specified order.
11 / 51
Retrieval-Augmented Generation
Active Retrieval Augmented Generation
Unlike single-time retrieval-augmented generation [5].
Active retrieval augmented generation is a generic framework that
actively decides when and what to retrieve through the generation
Formally, at step t(t ≥ 1), the retrieval query qt is formulated based on
both the user input x and previously generated output y<t
12 / 51
Retrieval-Augmented Generation
Active Retrieval Augmented Generation
Query formulation function will be
qt = qry(x, y<t)
, and when t = 1, the y<t = ∅.
Given the retrieved documents Dqt
, LMs continually generate the
answer until the next retrieval is triggered or reaches the end:
yt = LM

, x, yt

At each step, it discard previously retrieved documents ∪t′tDqt′ and
only use the retrieved documents from the current step to generate the
next token.
13 / 51
Forward-Looking Active REtrieval Augmented Generation
Table of contents I
1 Introduction
2 Retrieval-Augmented Generation
3 Forward-Looking Active REtrieval Augmented Generation
FLARE with Retrieval Instructions
Direct FLARE
Implementation Details
4 Multi-time Retrieval Baselines
5 Experimental Setup / Results
14 / 51
Forward-Looking Active REtrieval Augmented Generation
Table of contents II
6 Conclusion / Limitation
7 Reflection
15 / 51
Forward-Looking Active REtrieval Augmented Generation
Forward-Looking Active REtrieval Augmented Generation
FLARE follows these law:
1 LMs should only retrieve information when they do not have the
necessary knowledge to avoid unnecessary or inappropriate
2 The retrieval queries should reflect the intents of future
This paper also inspired by Toolformer [10]:
1 It prompts the LM to generate retrieval queries when necessary
while generating the answer using retrieval-encouraging
instructions (FLAREinstruct)
2 It directly uses the LM’s generation as search queries, if uncertain
tokens are present, retrieve and regenerate (FLAREdirect)
16 / 51
Forward-Looking Active REtrieval Augmented Generation
FLARE with Retrieval Instructions
FLARE with Retrieval Instructions
A straightforward way of expressing information needs for retrieval is
to generate “[Search(query)]” when additional information is needed
Prompt 3.1: retrieval instructions
Skill 1: An instruction to guide LMs to generate search queries.
Skill 2: An instruction to guide LMs to perform a specific
downstream task (e.g., multihop QA).
An instruction to guide LMs to combine skills 1 and 2 for the test case.
17 / 51
Forward-Looking Active REtrieval Augmented Generation
FLARE with Retrieval Instructions
When the LM generates “[Search(query)]”, it stops the generation and
use the query to retrieve relevant documents, as shown in Figure 2.
Figure 2: An illustration of FLAREinstruct
18 / 51
Forward-Looking Active REtrieval Augmented Generation
Direct FLARE
Direct FLARE
Using FLAREinstruct to retrieve instructions might not be reliable, so
they propose FLAREdirect.
It has two tricks:
1 Confidence-based Active Retrieval
2 Confidence-based Query Formulation
Masked sentences as implicit queries
Generated questions as explicit queries
19 / 51
Forward-Looking Active REtrieval Augmented Generation
Direct FLARE
Confidence-based Active Retrieval
First, it generates a temporary next sentence ŝt = LM([x, yt])
without conditioning on retrieved documents.
Then it will decide whether to trigger retrieval and formulate
queries based on ŝt. If ŝt is being confident by LM, it will accept ŝt.
Otherwise, it will use ŝt to formulate search queries qt to retrieve
relevant documents, and regenerate the next sentence st.
It actively trigger retrieval if any token of ŝt has a probability lower
than a threshold 𝜃 ∈ [0, 1]. 𝜃 = 0 means that retrieval is never
triggered, while 𝜃 = 1 triggers retrieval for every sentence.
20 / 51
Forward-Looking Active REtrieval Augmented Generation
Direct FLARE
Confidence-based Query Formulation
One way to perform retrieval is to directly use the next sentence ŝt as
the query qt, and this method achieves significantly better results than
with the previous context.
However, it has a risk of perpetuating errors contained in it , using this
erroneous sentence as a query could prompt the retriever to
retrieve irrelevant information, which could potentially mislead
future generations.
So they propose two method that can overcome this drawback:
Masked sentences as implicit queries
Generated questions as explicit queries
21 / 51
Forward-Looking Active REtrieval Augmented Generation
Direct FLARE
Implicit the explicit query formulation
Figure 3: Tokens with low probabilities are marked with underlines.
22 / 51
Forward-Looking Active REtrieval Augmented Generation
Direct FLARE
Masked sentences as implicit queries
Queries qt are formulated based on ŝt as follows:
qt =
∅ if all tokens of ŝt have probs ≥ 𝜃
mask (ŝt) or qgen (ŝt) otherwise
The first method masks out low-confidence tokens in ŝt with
probabilities below a threshold 𝛽 ∈ [0, 1], the higher 𝛽, more
aggressive masking.
23 / 51
Forward-Looking Active REtrieval Augmented Generation
Direct FLARE
Generated questions as explicit queries
Another method is to generate explicit questions that target the
low-confident span in ŝt.
Self-ask [7] achieved this by manually inserting follow-up questions
into downstream task exemplars as shown later in Prompt 4.1, which
requires task-specific annotation efforts.
Such that they developed a universal approach that generates questions
for low-confidence spans without additional annotation.
24 / 51
Forward-Looking Active REtrieval Augmented Generation
Direct FLARE
Generated questions as explicit queries
It first extract all spans from ŝt with probabilities below 𝛽. For each
extracted span z, it will prompt gpt-3.5-turbo to generate a question
qt,z that can be answered with the span, using the following prompt:
Prompt 3.2: zero-shot question generation
Using input x.
Generated output so far y≤t.
Given the above passage, ask a question to which the answer is the
term/entity/phrase “z”.
25 / 51
Forward-Looking Active REtrieval Augmented Generation
Implementation Details
Implementation Details
The initial query: ŝ1 = LM ([Dx, x])
Sentence tokenization: For each step t, it generates 64 tokens which
are longer than most sentences, and use NLTK sentence tokenizer1 to
extract the first sentence and discard the rest.
Document corpus and retrievers:
Wikipedia dump (document corpus) [3]
BM25 (retriever) [9]
26 / 51
Forward-Looking Active REtrieval Augmented Generation
Implementation Details
Implementation Details
Retrieved document formatting:
Prompt 3.3: document formatting
Search results:
[1] Document 1
[2] Document 2
. . .
The user input x
It average retrieval is triggered for 30% ∼ 60% of sentences depending
on downstream tasks. Compared to single-time retrieval,
interleaving retrieval and generation with a naive implementation
indeed increases overheads.
27 / 51
Multi-time Retrieval Baselines
Table of contents
1 Introduction
2 Retrieval-Augmented Generation
3 Forward-Looking Active REtrieval Augmented Generation
4 Multi-time Retrieval Baselines
5 Experimental Setup / Results
6 Conclusion / Limitation
7 Reflection
28 / 51
Multi-time Retrieval Baselines
Multi-time Retrieval Baselines
They formally introduce three baseline categories based on when and
what to retrieve.
1 Previous-window
2 Previous-sentence
3 Question decomposition
29 / 51
Multi-time Retrieval Baselines
It approaches trigger retrieval every l tokens, where l represents the
window size. Generated tokens from the previous window are used as
the query:
qt = yt−1 (t ≥ 2)
yt =

w(t−1)l+1, . . . , wtl

There are some existing methods in this category are RETRO [1],
IC-RALM [8], KNN-LM [4].
30 / 51
Multi-time Retrieval Baselines
Previous-sentence / Question decomposition
Previous-sentence approaches trigger retrieval every sentence and
use the previous sentence as the query:
qt = yt−1 (t ≥ 2)
yt = st
Question decomposition approaches manually annotated
task-specific exemplars to guide LMs to generate decomposed
sub-questions while producing outputs.
31 / 51
Multi-time Retrieval Baselines
Prompt 4.1: multihop QA with self-ask
Question: Who lived longer, Theodor Haecker or Harry Vaughan
Are follow up questions needed here: Yes.
Follow up: How old was Theodor Haecker when he died?
Intermediate answer: Theodor Haecker was 65 years old when he
Follow up: How old was Harry Vaughan Watkins when he died?
Intermediate answer: Harry Vaughan Watkins was 69 years old when
he died.
So the final answer is: Harry Vaughan Watkins.
32 / 51
Multi-time Retrieval Baselines
Notable drawbacks
1 Fixed-interval approaches use previously generated tokens as
queries which might not reflect what LMs intend to generate
in the future
2 Retrieving information at a fixed interval can be inefficient
because it might occur at inappropriate points
3 Question decomposition approaches require task-specific prompt
engineering, which restricts their generalizability in new tasks
33 / 51
Experimental Setup / Results
Table of contents
1 Introduction
2 Retrieval-Augmented Generation
3 Forward-Looking Active REtrieval Augmented Generation
4 Multi-time Retrieval Baselines
5 Experimental Setup / Results
6 Conclusion / Limitation
7 Reflection
34 / 51
Experimental Setup / Results
Experimental Setup
They evaluate the effectiveness of FLARE on 4 diverse
knowledge-intensive tasks using few-shot in-context learning, as
summarized in Table 1.
They compare the results of FLARE with baselines using the same
setting, sub-sample at most 500 examples from each dataset due to the
cost of running experiments.
The hyperparameters of FLARE are selected based on the
development set and listed in Table 2.
35 / 51
Experimental Setup / Results
Experimental Setup
Dataset that using for the experiment:
1 Multihop QA
2 Commonsense Reasoning
3 Long-form QA
4 Open-domain Summarization
36 / 51
Experimental Setup / Results
Experimental Results
Figure 4: Comparision between FLARE and baselines across all
37 / 51
Experimental Setup / Results
Ablation Study
Importance of forward-looking retrieval.
They first validate their hypothesis that forward-looking retrieval is
indeed more powerful than past-context-based retrieval.
Figure 5: A head-to-head comparison between using the previous sentence
and the next sentence for retrieval.
38 / 51
Experimental Setup / Results
Ablation Study
Importance of active retrieval.
Next, they investigate the relationship between performance and the
active retrieval threshold 𝜃.
Figure 6: Performance (EM) of FLARE with respect to the percentage of
steps/sentences with retrieval on 2WikiMultihopQA and StrategyQA.
39 / 51
Experimental Setup / Results
Ablation Study
Effectiveness of different query formulation methods.
Last, they study implicit query formation by masking and explicit
query formulation through question generation.
Figure 7: Performance of FLARE with respect to the masking threshold 𝛽 on
40 / 51
Conclusion / Limitation
Table of contents
1 Introduction
2 Retrieval-Augmented Generation
3 Forward-Looking Active REtrieval Augmented Generation
4 Multi-time Retrieval Baselines
5 Experimental Setup / Results
6 Conclusion / Limitation
7 Reflection
41 / 51
Conclusion / Limitation
This paper implement a framework with forward-looking active
retrieval that iteratively uses the upcoming sentence to retrieve
relevant information if it contains low-confidence tokens and
regenerates the next sentence.
42 / 51
Conclusion / Limitation
FLARE did not provide significant gains in Wizard of Wikipedia
dataset. Since its output is relatively short so retrieving multiple
disparate pieces of information might not be necessary.
From an engineering perspective, the LM needs to be activated
multiple times (once for each retrieval) and a caching-free
implementation will also require recomputing the previous activation
each time after a retrieval.
It can design an architecture to encode retrieved documents Dqt
the input/generation (x/yt) independently.
43 / 51
Table of contents
1 Introduction
2 Retrieval-Augmented Generation
3 Forward-Looking Active REtrieval Augmented Generation
4 Multi-time Retrieval Baselines
5 Experimental Setup / Results
6 Conclusion / Limitation
7 Reflection
44 / 51
Retrieval augmented generation (RAG) can help to specialize a
LLM to particular domain and use case. In this work, they provide
more trustworthy way to use retrieval.
If we want the language model to generate more thoughtful (more
emotion friendly) responses. In this case, it would be more interesting
to fine-tune a model on a psychology chat dataset [2]. So find the
dataset or prompt may a good way to focus. And parameter-efficient
method may help a lot for the dialogue system.
45 / 51
References I
[1] Sebastian Borgeaud et al. “Improving Language Models by
Retrieving from Trillions of Tokens”. In: Proceedings of the
39th International Conference on Machine Learning. Ed. by
Kamalika Chaudhuri et al. Vol. 162. Proceedings of Machine
Learning Research. PMLR, July 2022, pp. 2206–2240. url:
[2] Minhajul Hoque. Making Chat-bot more emotional. 2023. url:
46 / 51
References II
[3] Vladimir Karpukhin et al. “Dense Passage Retrieval for
Open-Domain Question Answering”. In: Proceedings of the
2020 Conference on Empirical Methods in Natural Language
Processing (EMNLP). Online: Association for Computational
Linguistics, Nov. 2020, pp. 6769–6781. doi:
10.18653/v1/2020.emnlp-main.550. url:
[4] Urvashi Khandelwal et al. Generalization through
Memorization: Nearest Neighbor Language Models. 2020.
arXiv: 1911.00172 [cs.CL].
47 / 51
References III
[5] Patrick Lewis et al. “Retrieval-Augmented Generation for
Knowledge-Intensive NLP Tasks”. In: Advances in Neural
Information Processing Systems. Ed. by H. Larochelle et al.
Vol. 33. Curran Associates, Inc., 2020, pp. 9459–9474. url:
[6] Long Ouyang et al. Training language models to follow
instructions with human feedback. 2022. arXiv: 2203.02155
[7] Ofir Press et al. Measuring and Narrowing the Compositionality
Gap in Language Models. 2023. arXiv: 2210.03350 [cs.CL].
48 / 51
References IV
[8] Ori Ram et al. In-Context Retrieval-Augmented Language
Models. 2023. arXiv: 2302.00083 [cs.CL].
[9] Stephen Robertson and Hugo Zaragoza. “The Probabilistic
Relevance Framework: BM25 and Beyond”. In: Foundations
and Trends® in Information Retrieval 3.4 (2009), pp. 333–389.
issn: 1554-0669. doi: 10.1561/1500000019. url:
[10] Timo Schick et al. Toolformer: Language Models Can Teach
Themselves to Use Tools. 2023. arXiv: 2302.04761 [cs.CL].
[11] Harsh Trivedi et al. Interleaving Retrieval with
Chain-of-Thought Reasoning for Knowledge-Intensive
Multi-Step Questions. 2022. arXiv: 2212.10509 [cs.CL].
49 / 51
A1:Experimental settings
Settings 2WikiMultihopQA StrategyQA ASQA WikiAsp
Dataset statistics
Task multihop QA commonsense QA long-form QA open-domain summarization
#Examples 500 229 500 500
Evaluation settings
Metrics EM, F1, Prec., Rec. EM EM, Disambig- F1, ROL UniEval, entity- F1, ROUGE
Retrieval settings
Corpus Wikipedia Wikipedia Wikipedia open web
Retriever BM25 BM25 BM25 Bing
Top-k 2 3 3 5
Prompt format
#Exemplars 8 6 8 4
Ret. for exemplars ✓ x x x
Table 1: Statistics and experimental settings of different tasks/datasets
50 / 51
A1:Experimental settings
Dataset 𝜃 𝛽 Query formulation Combine single-  multi-time retrieval
2WikiMultihopQA 0.8 0.4 implicit X
StrategyQA 0.4 0.4 implicit X
ASQA  ASQA-hint 0.8 0.4 explicit ✓
WikiAsp 0.8 0.4 explicit ✓
Table 2: Statistics and experimental settings of different tasks/datasets
51 / 51

More Related Content

What's hot

LangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AILangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AIOzgurOscarOzkan
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapAnant Corporation
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Hady Elsahar
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostLLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostAggregage
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGISynaptonIncorporated
ChatGPT-the-revolution-is-coming.pdfLiang Yan
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationKnoldus Inc.
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMsLoic Merckel
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxKnoldus Inc.
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMsSylvainGugger
A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3Ishan Jain
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...ssuser4edc93
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfDavid Rostcheck
[BEDROCK] Claude Prompt Engineering Techniques.pptx
[BEDROCK] Claude Prompt Engineering Techniques.pptx[BEDROCK] Claude Prompt Engineering Techniques.pptx
[BEDROCK] Claude Prompt Engineering Techniques.pptxssuserdd71c7
Generative-AI-in-enterprise-20230615.pdfLiming Zhu
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models BootcampData Science Dojo
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby

What's hot (20)

LangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AILangChain Intro by KeyMate.AI
LangChain Intro by KeyMate.AI
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostLLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
What is langchain
What is langchainWhat is langchain
What is langchain
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
[BEDROCK] Claude Prompt Engineering Techniques.pptx
[BEDROCK] Claude Prompt Engineering Techniques.pptx[BEDROCK] Claude Prompt Engineering Techniques.pptx
[BEDROCK] Claude Prompt Engineering Techniques.pptx
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...

Similar to Active Retrieval Augmented Generation.pdf

Presentation of loops
Presentation of loopsPresentation of loops
Presentation of loopsAhmad Kazmi
PART-3 : Mastering RTOS FreeRTOS and STM32Fx with Debugging
PART-3 : Mastering RTOS FreeRTOS and STM32Fx with DebuggingPART-3 : Mastering RTOS FreeRTOS and STM32Fx with Debugging
PART-3 : Mastering RTOS FreeRTOS and STM32Fx with DebuggingFastBit Embedded Brain Academy
What's new in Visual Studio 2012 General
What's new in Visual Studio 2012 GeneralWhat's new in Visual Studio 2012 General
What's new in Visual Studio 2012 GeneralNoam Sheffer
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...Chris Fregly
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep DiveVasia Kalavri
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkDatabricks
i-Eclat: performance enhancement of Eclat via incremental approach in frequen...
i-Eclat: performance enhancement of Eclat via incremental approach in frequen...i-Eclat: performance enhancement of Eclat via incremental approach in frequen...
i-Eclat: performance enhancement of Eclat via incremental approach in frequen...TELKOMNIKA JOURNAL
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Chris Fregly
Returning the right results - Jettro Coenradie
Returning the right results - Jettro CoenradieReturning the right results - Jettro Coenradie
Returning the right results - Jettro CoenradieNLJUG
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Till Rohrmann
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesCloudera, Inc.
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...Kalman Graffi
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...HostedbyConfluent
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...Po-Chuan Chen
Pretext Knowledge Grids on Unstructured Data for Facilitating Online Education
Pretext Knowledge Grids on Unstructured Data for Facilitating  Online EducationPretext Knowledge Grids on Unstructured Data for Facilitating  Online Education
Pretext Knowledge Grids on Unstructured Data for Facilitating Online EducationIOSR Journals
Functional Smalltalk
Functional SmalltalkFunctional Smalltalk
Functional SmalltalkESUG

Similar to Active Retrieval Augmented Generation.pdf (20)

Presentation of loops
Presentation of loopsPresentation of loops
Presentation of loops
PART-3 : Mastering RTOS FreeRTOS and STM32Fx with Debugging
PART-3 : Mastering RTOS FreeRTOS and STM32Fx with DebuggingPART-3 : Mastering RTOS FreeRTOS and STM32Fx with Debugging
PART-3 : Mastering RTOS FreeRTOS and STM32Fx with Debugging
What's new in Visual Studio 2012 General
What's new in Visual Studio 2012 GeneralWhat's new in Visual Studio 2012 General
What's new in Visual Studio 2012 General
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
Programas y Pruebas en Dafny
Programas y Pruebas en DafnyProgramas y Pruebas en Dafny
Programas y Pruebas en Dafny
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
i-Eclat: performance enhancement of Eclat via incremental approach in frequen...
i-Eclat: performance enhancement of Eclat via incremental approach in frequen...i-Eclat: performance enhancement of Eclat via incremental approach in frequen...
i-Eclat: performance enhancement of Eclat via incremental approach in frequen...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Returning the right results - Jettro Coenradie
Returning the right results - Jettro CoenradieReturning the right results - Jettro Coenradie
Returning the right results - Jettro Coenradie
L2624 labriola
L2624 labriolaL2624 labriola
L2624 labriola
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Effective Structured Prompting by Meta-Learning and Representative Verbalizer...
Pretext Knowledge Grids on Unstructured Data for Facilitating Online Education
Pretext Knowledge Grids on Unstructured Data for Facilitating  Online EducationPretext Knowledge Grids on Unstructured Data for Facilitating  Online Education
Pretext Knowledge Grids on Unstructured Data for Facilitating Online Education
Functional Smalltalk
Functional SmalltalkFunctional Smalltalk
Functional Smalltalk

More from Po-Chuan Chen

Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfPo-Chuan Chen
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...Po-Chuan Chen
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdfOn the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdfPo-Chuan Chen
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...Po-Chuan Chen
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfPo-Chuan Chen
A Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdfA Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdfPo-Chuan Chen
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdfAdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdfPo-Chuan Chen
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...Po-Chuan Chen
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfPo-Chuan Chen
Cold_Start_Reinforcement_Learning_with_Softmax_Policy_Gradient.pdfPo-Chuan Chen
Image_to_Prompts.pdfPo-Chuan Chen
Evaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdfEvaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdfPo-Chuan Chen
Off-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfOff-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfPo-Chuan Chen
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdfA Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdfPo-Chuan Chen
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
Is Reinforcement Learning (Not) for Natural
Language Processing.pdfIs Reinforcement Learning (Not) for Natural
Language Processing.pdf
Is Reinforcement Learning (Not) for Natural Language Processing.pdfPo-Chuan Chen
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfHyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfPo-Chuan Chen
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdfPo-Chuan Chen
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...Po-Chuan Chen
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...Po-Chuan Chen
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...Po-Chuan Chen

More from Po-Chuan Chen (20)

Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible...
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdfOn the Effectiveness of Offline RL for Dialogue Response Generation.pdf
On the Effectiveness of Offline RL for Dialogue Response Generation.pdf
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transfor...
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdf
A Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdfA Neural Corpus Indexer for Document Retrieval.pdf
A Neural Corpus Indexer for Document Retrieval.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdfAdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning.pdf
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Evaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdfEvaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfOff-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdfA Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
A Mixture-of-Expert Approach to RL-based Dialogue Management.pdf
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
Is Reinforcement Learning (Not) for Natural
Language Processing.pdfIs Reinforcement Learning (Not) for Natural
Language Processing.pdf
Is Reinforcement Learning (Not) for Natural Language Processing.pdf
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfHyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdf
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Leveling to the Last Mile: Near-zero-cost Bit Level Wear Leveling for PCM-bas...
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...
Beyond Write-reduction Consideration: A Wear-leveling-enabled B+-tree Indexin...
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...
Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Fla...

Recently uploaded

pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture designssuser87fa0c1
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixingviprabot1
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank M.Gokilavani
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3

Recently uploaded (20)

pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture design
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixing
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx

Active Retrieval Augmented Generation.pdf

  • 1. FLARE Active Retrieval Augmented Generation Zhengbao Jiang, Frank F. Xu, Luyu Gao et al. Speaker: Po-Chuan Chen Jun 27, 2023 1 / 51
  • 2. FLARE Table of contents 1 Introduction 2 Retrieval-Augmented Generation 3 Forward-Looking Active REtrieval Augmented Generation 4 Multi-time Retrieval Baselines 5 Experimental Setup / Results 6 Conclusion / Limitation 7 Reflection 2 / 51
  • 3. FLARE Abstract Most existing retrieval-augmented LMs employ a retrieve-and-generate setup that only retrieves information once based on the input. This is limiting, however, in more general scenarios involving generation of long texts, where continually gathering information throughout the generation process is essential. 3 / 51
  • 4. FLARE Abstract This paper propose Forward-Looking Active REtrieval augmented generation (FLARE), a generic retrieval-augmented generation method which iteratively uses a prediction of the upcoming sentence to anticipate future content, which is then utilized as a query to retrieve relevant documents to regenerate the sentence if it contains low-confidence tokens. 4 / 51
  • 5. FLARE Introduction Table of contents 1 Introduction 2 Retrieval-Augmented Generation 3 Forward-Looking Active REtrieval Augmented Generation 4 Multi-time Retrieval Baselines 5 Experimental Setup / Results 6 Conclusion / Limitation 7 Reflection 5 / 51
  • 6. FLARE Introduction Introduction Generative language models (LMs) still tend to hallucinate and create imaginary content. To address the issue of hallucination, one promising direction is to augment generation with retrieval, which involves augmenting parametric LMs with non-parametric retrieval components that can look up relevant information from external knowledge resources such as document corpora [5]. 6 / 51
  • 7. FLARE Introduction Introduction But there has some problems, that initial retrieval based on the topic name (e.g., Joe Biden) may not cover all aspects and details. Therefore, it is crucial to retrieve extra information as needed during the generation process, such as when generating a certain aspect (e.g., the education history of Joe Biden) or a specific detail (e.g., when did Joe Biden announce his candidacy for the 2020 presidential campaign). 7 / 51
  • 8. FLARE Introduction Figure 1: FLARE: Starting with the user input x and initial retrieval results Dx. Low-probability tokens (indicated with underline) 8 / 51
  • 9. FLARE Introduction Contribution Forward-Looking Active REtrieval augmented generation (FLARE), as illustrated in Figure 1. Iteratively generates a temporary next sentence, use it as the query to retrieve relevant documents if it contains low-probability tokens and regenerate the next sentence until reaches the end. FLARE is applicable to any existing LMs at inference time without additional training. Since GPT-3.5 [6] announced, they examine the effectiveness of their methods on text-davinci-003. 9 / 51
  • 10. FLARE Retrieval-Augmented Generation Table of contents 1 Introduction 2 Retrieval-Augmented Generation 3 Forward-Looking Active REtrieval Augmented Generation 4 Multi-time Retrieval Baselines 5 Experimental Setup / Results 6 Conclusion / Limitation 7 Reflection 10 / 51
  • 11. FLARE Retrieval-Augmented Generation Notations and Definitions Given a user input x and a document corpus D = {di}|D| i=1 , the goal of retrieval-augmented LMs is to generate the answer y = [s1, s2, . . . , sm] = [w1, w2, . . . , wn] containing m sentences or n tokens leveraging information retrieved from the corpus. A retriever that can retrieve a list of documents Dq = ret(q) for a query q. Their method following existing methods [8, 11] to prepend the retrieved documents before the user input to aid future generation for both baselines and FLARE for fair comparisons: y = LM([Dq, x]), where [·, ·] is concatenation following the specified order. 11 / 51
  • 12. FLARE Retrieval-Augmented Generation Active Retrieval Augmented Generation Unlike single-time retrieval-augmented generation [5]. Active retrieval augmented generation is a generic framework that actively decides when and what to retrieve through the generation process. Formally, at step t(t ≥ 1), the retrieval query qt is formulated based on both the user input x and previously generated output y<t 12 / 51
  • 13. FLARE Retrieval-Augmented Generation Active Retrieval Augmented Generation Query formulation function will be qt = qry(x, y<t) , and when t = 1, the y<t = ∅. Given the retrieved documents Dqt , LMs continually generate the answer until the next retrieval is triggered or reaches the end: yt = LM Dqt , x, yt At each step, it discard previously retrieved documents ∪t′tDqt′ and only use the retrieved documents from the current step to generate the next token. 13 / 51
  • 14. FLARE Forward-Looking Active REtrieval Augmented Generation Table of contents I 1 Introduction 2 Retrieval-Augmented Generation 3 Forward-Looking Active REtrieval Augmented Generation FLARE with Retrieval Instructions Direct FLARE Implementation Details 4 Multi-time Retrieval Baselines 5 Experimental Setup / Results 14 / 51
  • 15. FLARE Forward-Looking Active REtrieval Augmented Generation Table of contents II 6 Conclusion / Limitation 7 Reflection 15 / 51
  • 16. FLARE Forward-Looking Active REtrieval Augmented Generation Forward-Looking Active REtrieval Augmented Generation FLARE follows these law: 1 LMs should only retrieve information when they do not have the necessary knowledge to avoid unnecessary or inappropriate retrieval 2 The retrieval queries should reflect the intents of future generations This paper also inspired by Toolformer [10]: 1 It prompts the LM to generate retrieval queries when necessary while generating the answer using retrieval-encouraging instructions (FLAREinstruct) 2 It directly uses the LM’s generation as search queries, if uncertain tokens are present, retrieve and regenerate (FLAREdirect) 16 / 51
  • 17. FLARE Forward-Looking Active REtrieval Augmented Generation FLARE with Retrieval Instructions FLARE with Retrieval Instructions A straightforward way of expressing information needs for retrieval is to generate “[Search(query)]” when additional information is needed [10]. Prompt 3.1: retrieval instructions Skill 1: An instruction to guide LMs to generate search queries. Skill 2: An instruction to guide LMs to perform a specific downstream task (e.g., multihop QA). An instruction to guide LMs to combine skills 1 and 2 for the test case. 17 / 51
  • 18. FLARE Forward-Looking Active REtrieval Augmented Generation FLARE with Retrieval Instructions When the LM generates “[Search(query)]”, it stops the generation and use the query to retrieve relevant documents, as shown in Figure 2. Figure 2: An illustration of FLAREinstruct 18 / 51
  • 19. FLARE Forward-Looking Active REtrieval Augmented Generation Direct FLARE Direct FLARE Using FLAREinstruct to retrieve instructions might not be reliable, so they propose FLAREdirect. It has two tricks: 1 Confidence-based Active Retrieval 2 Confidence-based Query Formulation Masked sentences as implicit queries Generated questions as explicit queries 19 / 51
  • 20. FLARE Forward-Looking Active REtrieval Augmented Generation Direct FLARE Confidence-based Active Retrieval First, it generates a temporary next sentence ŝt = LM([x, yt]) without conditioning on retrieved documents. Then it will decide whether to trigger retrieval and formulate queries based on ŝt. If ŝt is being confident by LM, it will accept ŝt. Otherwise, it will use ŝt to formulate search queries qt to retrieve relevant documents, and regenerate the next sentence st. It actively trigger retrieval if any token of ŝt has a probability lower than a threshold 𝜃 ∈ [0, 1]. 𝜃 = 0 means that retrieval is never triggered, while 𝜃 = 1 triggers retrieval for every sentence. 20 / 51
  • 21. FLARE Forward-Looking Active REtrieval Augmented Generation Direct FLARE Confidence-based Query Formulation One way to perform retrieval is to directly use the next sentence ŝt as the query qt, and this method achieves significantly better results than with the previous context. However, it has a risk of perpetuating errors contained in it , using this erroneous sentence as a query could prompt the retriever to retrieve irrelevant information, which could potentially mislead future generations. So they propose two method that can overcome this drawback: Masked sentences as implicit queries Generated questions as explicit queries 21 / 51
  • 22. FLARE Forward-Looking Active REtrieval Augmented Generation Direct FLARE Implicit the explicit query formulation Figure 3: Tokens with low probabilities are marked with underlines. 22 / 51
  • 23. FLARE Forward-Looking Active REtrieval Augmented Generation Direct FLARE Masked sentences as implicit queries Queries qt are formulated based on ŝt as follows: qt = ( ∅ if all tokens of ŝt have probs ≥ 𝜃 mask (ŝt) or qgen (ŝt) otherwise The first method masks out low-confidence tokens in ŝt with probabilities below a threshold 𝛽 ∈ [0, 1], the higher 𝛽, more aggressive masking. 23 / 51
  • 24. FLARE Forward-Looking Active REtrieval Augmented Generation Direct FLARE Generated questions as explicit queries Another method is to generate explicit questions that target the low-confident span in ŝt. Self-ask [7] achieved this by manually inserting follow-up questions into downstream task exemplars as shown later in Prompt 4.1, which requires task-specific annotation efforts. Such that they developed a universal approach that generates questions for low-confidence spans without additional annotation. 24 / 51
  • 25. FLARE Forward-Looking Active REtrieval Augmented Generation Direct FLARE Generated questions as explicit queries It first extract all spans from ŝt with probabilities below 𝛽. For each extracted span z, it will prompt gpt-3.5-turbo to generate a question qt,z that can be answered with the span, using the following prompt: Prompt 3.2: zero-shot question generation Using input x. Generated output so far y≤t. Given the above passage, ask a question to which the answer is the term/entity/phrase “z”. 25 / 51
  • 26. FLARE Forward-Looking Active REtrieval Augmented Generation Implementation Details Implementation Details The initial query: ŝ1 = LM ([Dx, x]) Sentence tokenization: For each step t, it generates 64 tokens which are longer than most sentences, and use NLTK sentence tokenizer1 to extract the first sentence and discard the rest. Document corpus and retrievers: Wikipedia dump (document corpus) [3] BM25 (retriever) [9] 1nltk.tokenize.PunktSentenceTokenizer 26 / 51
  • 27. FLARE Forward-Looking Active REtrieval Augmented Generation Implementation Details Implementation Details Retrieved document formatting: Prompt 3.3: document formatting Search results: [1] Document 1 [2] Document 2 . . . The user input x Efficiency: It average retrieval is triggered for 30% ∼ 60% of sentences depending on downstream tasks. Compared to single-time retrieval, interleaving retrieval and generation with a naive implementation indeed increases overheads. 27 / 51
  • 28. FLARE Multi-time Retrieval Baselines Table of contents 1 Introduction 2 Retrieval-Augmented Generation 3 Forward-Looking Active REtrieval Augmented Generation 4 Multi-time Retrieval Baselines 5 Experimental Setup / Results 6 Conclusion / Limitation 7 Reflection 28 / 51
  • 29. FLARE Multi-time Retrieval Baselines Multi-time Retrieval Baselines They formally introduce three baseline categories based on when and what to retrieve. 1 Previous-window 2 Previous-sentence 3 Question decomposition 29 / 51
  • 30. FLARE Multi-time Retrieval Baselines Previous-window It approaches trigger retrieval every l tokens, where l represents the window size. Generated tokens from the previous window are used as the query: qt = yt−1 (t ≥ 2) yt = w(t−1)l+1, . . . , wtl There are some existing methods in this category are RETRO [1], IC-RALM [8], KNN-LM [4]. 30 / 51
  • 31. FLARE Multi-time Retrieval Baselines Previous-sentence / Question decomposition Previous-sentence approaches trigger retrieval every sentence and use the previous sentence as the query: qt = yt−1 (t ≥ 2) yt = st Question decomposition approaches manually annotated task-specific exemplars to guide LMs to generate decomposed sub-questions while producing outputs. 31 / 51
  • 32. FLARE Multi-time Retrieval Baselines Prompt 4.1: multihop QA with self-ask Question: Who lived longer, Theodor Haecker or Harry Vaughan Watkins? Are follow up questions needed here: Yes. Follow up: How old was Theodor Haecker when he died? Intermediate answer: Theodor Haecker was 65 years old when he died. Follow up: How old was Harry Vaughan Watkins when he died? Intermediate answer: Harry Vaughan Watkins was 69 years old when he died. So the final answer is: Harry Vaughan Watkins. 32 / 51
  • 33. FLARE Multi-time Retrieval Baselines Notable drawbacks 1 Fixed-interval approaches use previously generated tokens as queries which might not reflect what LMs intend to generate in the future 2 Retrieving information at a fixed interval can be inefficient because it might occur at inappropriate points 3 Question decomposition approaches require task-specific prompt engineering, which restricts their generalizability in new tasks 33 / 51
  • 34. FLARE Experimental Setup / Results Table of contents 1 Introduction 2 Retrieval-Augmented Generation 3 Forward-Looking Active REtrieval Augmented Generation 4 Multi-time Retrieval Baselines 5 Experimental Setup / Results 6 Conclusion / Limitation 7 Reflection 34 / 51
  • 35. FLARE Experimental Setup / Results Experimental Setup They evaluate the effectiveness of FLARE on 4 diverse knowledge-intensive tasks using few-shot in-context learning, as summarized in Table 1. They compare the results of FLARE with baselines using the same setting, sub-sample at most 500 examples from each dataset due to the cost of running experiments. The hyperparameters of FLARE are selected based on the development set and listed in Table 2. 35 / 51
  • 36. FLARE Experimental Setup / Results Experimental Setup Dataset that using for the experiment: 1 Multihop QA 2 Commonsense Reasoning 3 Long-form QA 4 Open-domain Summarization 36 / 51
  • 37. FLARE Experimental Setup / Results Experimental Results Figure 4: Comparision between FLARE and baselines across all tasks/datasets. 37 / 51
  • 38. FLARE Experimental Setup / Results Ablation Study Importance of forward-looking retrieval. They first validate their hypothesis that forward-looking retrieval is indeed more powerful than past-context-based retrieval. Figure 5: A head-to-head comparison between using the previous sentence and the next sentence for retrieval. 38 / 51
  • 39. FLARE Experimental Setup / Results Ablation Study Importance of active retrieval. Next, they investigate the relationship between performance and the active retrieval threshold 𝜃. Figure 6: Performance (EM) of FLARE with respect to the percentage of steps/sentences with retrieval on 2WikiMultihopQA and StrategyQA. 39 / 51
  • 40. FLARE Experimental Setup / Results Ablation Study Effectiveness of different query formulation methods. Last, they study implicit query formation by masking and explicit query formulation through question generation. Figure 7: Performance of FLARE with respect to the masking threshold 𝛽 on 2WikiMultihopQA. 40 / 51
  • 41. FLARE Conclusion / Limitation Table of contents 1 Introduction 2 Retrieval-Augmented Generation 3 Forward-Looking Active REtrieval Augmented Generation 4 Multi-time Retrieval Baselines 5 Experimental Setup / Results 6 Conclusion / Limitation 7 Reflection 41 / 51
  • 42. FLARE Conclusion / Limitation Conclusion This paper implement a framework with forward-looking active retrieval that iteratively uses the upcoming sentence to retrieve relevant information if it contains low-confidence tokens and regenerates the next sentence. 42 / 51
  • 43. FLARE Conclusion / Limitation Limitation FLARE did not provide significant gains in Wizard of Wikipedia dataset. Since its output is relatively short so retrieving multiple disparate pieces of information might not be necessary. From an engineering perspective, the LM needs to be activated multiple times (once for each retrieval) and a caching-free implementation will also require recomputing the previous activation each time after a retrieval. It can design an architecture to encode retrieved documents Dqt and the input/generation (x/yt) independently. 43 / 51
  • 44. FLARE Reflection Table of contents 1 Introduction 2 Retrieval-Augmented Generation 3 Forward-Looking Active REtrieval Augmented Generation 4 Multi-time Retrieval Baselines 5 Experimental Setup / Results 6 Conclusion / Limitation 7 Reflection 44 / 51
  • 45. FLARE Reflection Reflection Retrieval augmented generation (RAG) can help to specialize a LLM to particular domain and use case. In this work, they provide more trustworthy way to use retrieval. If we want the language model to generate more thoughtful (more emotion friendly) responses. In this case, it would be more interesting to fine-tune a model on a psychology chat dataset [2]. So find the dataset or prompt may a good way to focus. And parameter-efficient method may help a lot for the dialogue system. 45 / 51
  • 46. FLARE Reflection References I [1] Sebastian Borgeaud et al. “Improving Language Models by Retrieving from Trillions of Tokens”. In: Proceedings of the 39th International Conference on Machine Learning. Ed. by Kamalika Chaudhuri et al. Vol. 162. Proceedings of Machine Learning Research. PMLR, July 2022, pp. 2206–2240. url: https: // [2] Minhajul Hoque. Making Chat-bot more emotional. 2023. url: and-answers/418148#2309888. 46 / 51
  • 47. FLARE Reflection References II [3] Vladimir Karpukhin et al. “Dense Passage Retrieval for Open-Domain Question Answering”. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics, Nov. 2020, pp. 6769–6781. doi: 10.18653/v1/2020.emnlp-main.550. url: [4] Urvashi Khandelwal et al. Generalization through Memorization: Nearest Neighbor Language Models. 2020. arXiv: 1911.00172 [cs.CL]. 47 / 51
  • 48. FLARE Reflection References III [5] Patrick Lewis et al. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”. In: Advances in Neural Information Processing Systems. Ed. by H. Larochelle et al. Vol. 33. Curran Associates, Inc., 2020, pp. 9459–9474. url: https: // file/6b493230205f780e1bc26945df7481e5-Paper.pdf. [6] Long Ouyang et al. Training language models to follow instructions with human feedback. 2022. arXiv: 2203.02155 [cs.CL]. [7] Ofir Press et al. Measuring and Narrowing the Compositionality Gap in Language Models. 2023. arXiv: 2210.03350 [cs.CL]. 48 / 51
  • 49. FLARE Reflection References IV [8] Ori Ram et al. In-Context Retrieval-Augmented Language Models. 2023. arXiv: 2302.00083 [cs.CL]. [9] Stephen Robertson and Hugo Zaragoza. “The Probabilistic Relevance Framework: BM25 and Beyond”. In: Foundations and Trends® in Information Retrieval 3.4 (2009), pp. 333–389. issn: 1554-0669. doi: 10.1561/1500000019. url: [10] Timo Schick et al. Toolformer: Language Models Can Teach Themselves to Use Tools. 2023. arXiv: 2302.04761 [cs.CL]. [11] Harsh Trivedi et al. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions. 2022. arXiv: 2212.10509 [cs.CL]. 49 / 51
  • 50. FLARE Reflection A1:Experimental settings Settings 2WikiMultihopQA StrategyQA ASQA WikiAsp Dataset statistics Task multihop QA commonsense QA long-form QA open-domain summarization #Examples 500 229 500 500 Evaluation settings Metrics EM, F1, Prec., Rec. EM EM, Disambig- F1, ROL UniEval, entity- F1, ROUGE Retrieval settings Corpus Wikipedia Wikipedia Wikipedia open web Retriever BM25 BM25 BM25 Bing Top-k 2 3 3 5 Prompt format #Exemplars 8 6 8 4 Ret. for exemplars ✓ x x x Table 1: Statistics and experimental settings of different tasks/datasets 50 / 51
  • 51. FLARE Reflection A1:Experimental settings Dataset 𝜃 𝛽 Query formulation Combine single- multi-time retrieval 2WikiMultihopQA 0.8 0.4 implicit X StrategyQA 0.4 0.4 implicit X ASQA ASQA-hint 0.8 0.4 explicit ✓ WikiAsp 0.8 0.4 explicit ✓ Table 2: Statistics and experimental settings of different tasks/datasets 51 / 51