Retrieval_Augmented_Generation_Presentation.pptx

Retrieval-Augmented
Generation (RAG)
How to build grounded, up-to-date LLM apps
LLMs + Your Data
A practical, end-to-end overview

Overview
2
Retrieval-Augmented Generation (RAG)
Agenda
• Why RAG exists: the “parametric memory” problem
• Core architecture: ingest retrieve generate
→ →
• What makes retrieval work (chunking, indexing, reranking)
• Evaluation: retrieval + generation metrics
• Security & production concerns
• When to choose RAG vs fine-tuning

Motivation
3
The problem: LLMs don't “know” your latest truth
Freshness
Models can't
“update”
instantly
Provenance
Hard to cite
sources
Accuracy
Hallucinations
happen
RAG is a pattern that adds **external context at runtime** so the model can:
• Answer using your docs / databases (not just training data)
• Provide traceable citations or snippets
• Stay current as documents change

Concept
4
What is RAG?
Definition
Retrieval-Augmented Generation injects
external context into the prompt at
runtime to improve responses.
Mental model
Answer = LLM( question +
retrieved context )
You're giving the model “open book” access to a
curated library.
Typical outcomes
• Higher factuality on knowledge-intensive questions
• Faster updates: change docs change answers
→
• Better auditability with quoted snippets/citations

Architecture
5
Core architecture: retrieve ground generate
→ →
User
question
Retriever
(search)
Knowledge
store
LLM
(generate)
Retrieved
context
Key idea: keep retrieval and generation as separate components you can optimize independently.

Architecture
6
Reference pipeline (ingest + query flows)
In practice you'll have two loops: (1) offline ingestion and (2) online query retrieval generation.
→ →

Ingestion
7
Ingestion: turn documents into searchable chunks
Steps
• Load: PDFs, HTML, tickets, wikis, databases
• Clean: remove boilerplate, fix encoding,
normalize tables
• Split: chunk text with overlap
• Embed: create vectors + store metadata
• Index: build vector (and optionally keyword)
index
Metadata is your superpower
• source URL/path
• document type
• owner/team
• timestamp/version
• ACLs / permissions
Common knobs
• chunk size & overlap
• top‑k retrieved
• hybrid search weights
• reranker on/off
• context window budget

Retrieval
8
Retrieval strategies (what to try first)
Dense (vector) search
• Best for semantic
matches
• Uses embeddings +
similarity
• Good default for Q&A
Sparse (keyword) search
• Best for exact terms / IDs
• BM25/TF‑IDF style
• Great for “error codes”
Hybrid + reranking
• Combine dense + sparse
• Rerank top‑N with
stronger model
• Often best quality
Rule of thumb: start dense add metadata filters add hybrid add reranking only if needed.
→ → →

Retrieval
9
Vector indexes: fast similarity search at scale
What a vector index does
• Stores embeddings (vectors) + metadata
• Runs approximate nearest-neighbor (ANN)
search
• Returns top‑k chunks for a query embedding
• Often supports filters (type, owner, date, ACL)
Common index families
• HNSW (graph)
• IVF / PQ (quantization)
• Flat (exact, small corpora)
One popular OSS option
FAISS: efficient similarity search and clustering of
dense vectors (CPU/GPU).

Generation
10
Context assembly: make it easy to be correct
Prompt patterns that work
• Put retrieved chunks in a consistent
“Context:” section
• Ask for grounded answers + citations
• If context is insufficient, say so (don't guess)
• Use short chunk IDs for quoting
Example (structure)
SYSTEM: You answer using ONLY the provided context.
CONTEXT:
[1] …chunk text… (source, date)
[2] …chunk text… (source, date)
USER: <question>
ASSISTANT:
- Answer in 3–6 bullets
- Cite like [1][2]
- If missing info, say what's missing

Evaluation
11
Evaluation: measure retrieval AND generation
Retrieval quality
• Recall@k (did we retrieve the needed chunk?)
• MRR (how early is the first correct chunk?)
• nDCG (ranking quality with graded relevance)
Generation quality
• Faithfulness (supported by context)
• Answer relevance (addresses the question)
• Citation precision/recall (if you require citations)
Ragas (RAGAS) framework
A reference-free evaluation approach designed for
RAG pipelines (retrieval + generation metrics).

Security
12
Security: RAG helps, but doesn't “solve” prompt injection
Common RAG-specific risks
• Prompt injection inside retrieved docs
• Data exfiltration via overly-broad retrieval
• Cross-tenant leakage without strong ACL
filtering
• Malicious documents poisoning the index
Mitigations to layer
• Treat retrieved text as untrusted input
• Strict system instructions + output constraints
• Document allowlists + signed ingestion
• Permission filters at query time (ACL-aware
retrieval)
• Red-team tests for injection patterns

Production
13
Production: quality vs latency is a knob you can tune
Latency drivers
• Vector search (ANN) + filters
• Reranking (can dominate latency)
• Bigger context longer model inference
→
• Cold caches (embeddings, docs, index shards)
Illustrative example
2 4 6 8 10
0
50
100
150
200
250
300
350
400
450
Tip: measure end-to-end. Often the cheapest win is better chunking + reranking only top‑N.

Decision
14
RAG vs fine-tuning: choose based on what you're
changing
When RAG is a great fit
• You need fresh / frequently-updated
knowledge
• You want citations + traceability
• You can tolerate some retrieval latency
When fine-tuning helps more
• You're changing style/format behaviors
• You want lower latency (no retrieval)
• Your task is stable and well-labeled
Illustrative comparison (higher is better)
Freshness Citations Latency Cost to update
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
RAG Fine-tune

Next steps
15
RAG implementation checklist
Build
• Pick a corpus + owners
• Chunk + embed + index
• Add metadata + ACLs
• Add hybrid + rerank only if needed
Operate
• Offline eval set + dashboards
• Monitor retrieval drift
• Cache hot queries + embeddings
• Security reviews + red-team
Good RAG is an engineering discipline: data quality + retrieval quality + evaluation
loops.

Q&A
Want a hands-on follow-up? Build a tiny RAG over one PDF.

Retrieval_Augmented_Generation_Presentation.pptx

More Related Content

Similar to Retrieval_Augmented_Generation_Presentation.pptx

Recently uploaded

Retrieval_Augmented_Generation_Presentation.pptx

Editor's Notes