SlideShare a Scribd company logo
Gen AI meetup
Technology
You said Large Language Model ?
• Generative deep learning models for
understanding and generating text, images
and other types
• A special kind : Transformers
• “Attention is All you Need”, Vaswani et al.
2017 (https://arxiv.org/abs/1706.03762)
• Transformers analyse chunks of data, called
“tokens” and learn to predict the next token
in a sequence
• Prediction is a probability
• Model that can generalize : one single model
to address several use cases
Focus on Language Models
Build the model - Training
What it’s like ?
• Foundational models
• Datasets
LLM are trained using techniques that requires huge text-based datasets, e.g.
“The Pile” : +880 Gb (Wikipedia, Youtube st, Github, …)
“RedPajama”: +5Tb (wikipedia, StackExchange, ArXiv, …)
Choosing and curating datasets for training is the secret sauce !
• Computing Power
Transformer-based model have limitations: quadratic-complexity of attention mechanism
Computationally intensive for long sequences
Common patterns
• Context
The size of input data given to the model :
size is limited !
• Prompt
The question / the task, enriched with ‘pre-
prompt’
• Zero-shot / Few-shot, …
To give or not samples of answers expected
• Temperature
How much the model is imaginative
Use the model - Inference
Which Model ?
Criteria to take in account for a use case
• Open Source vs Commercial
• Best of breed
• Versioning & lifecycle
• Cost e
ffi
ciency vs Overkill -> Size
• Accuracy
At the heart of the machine
• On Premises
• Compute: GPUs choice / VRAM size / Model
quantization
• NVIDIA T4 = 16Gb / 1100$
• NVIDIA A100 = 80Gb / 8000$
• Scalability : concurrent users, context size
• Online vs batch
• On Cloud
• Which one ? Cost, diversity and availability
• Pricing model: 1M token comes very fast ! 1 word ~ 4
tokens
• Sovereignty, data privacy
Infrastructure
Real-world usage
Aka your search engine 2.0
Very common use case =
“Retrival Augmented Generation”
RAG - 101
Search & Summarize In 4 Steps
Step 1 - Document loading
• Documents are loaded from data
connectors
• They are split into chunks
RAG
Step 2 - Embeddings
• Chunks are 'transformed' into
vectors (numbers)
✓It's the process of word
embedding, using a pre-trained
model
✓hundreds (even thousands !) of
dimensions are required to
represent the space of all words
• Vectors are stored in a dedicated
database (a vector database)
RAG
Step 3 - Retrieval
• Previous steps were preparatory
work, now comes the live part
• Question is vectorized as well,
used as an input for similarity
search
• Most relevant chunks are
retrieved, i.e. vectors coordinates
are close together
RAG
Step 4 - Generation
•Retrieved chunks are used to feed
the LLM prompt context
•Question is added to the prompt
•LLM reads the prompt and
generates a natural language
answer
•During this inference time,
the model requires a lot of GPU
power !
RAG
RAG engineering
Lots of moving part to reach performance !
Flow / Batch
Data Policy
Deduplication
Data cleanage
Attachments (images, pdf)
PII / Anonymization
Data policy / criticity
Chunking strategy
Embedding Model
Size
Language
Tokenizer
Vector DB Choice
Cloud / Local
Vectors dimensions
& reduction
Retrieval con
fi
g
(top_k, similarity)
Re-ranking
MMR score
RAG techniques
(Corrective, Self-re
fl
ective
Rag-Fusion, HyDE)
Chat memory
Model con
fi
g
(temperature, top_k, top_p)
Model Evaluation / derivation
(BLUE/RED, precision,
recall, F1 score, Ragas, truelens,
Human Feedback)
Prompt eng.
Guard rails
(Hallucinations, NSFW, …)
model compare / VertexSxS
Performance (TTFT, TPS, …)
PII / Anon (again)
UI-Integration
LLMOPS / MLOPS
Cost Ef
fi
ciency
Fine Tuning ?
OpenAI’s strategy
Demo time !

More Related Content

Similar to AI presentation and introduction - Retrieval Augmented Generation RAG 101

openai.pptx
openai.pptxopenai.pptx
openai.pptx
Dori Waldman
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
Minha Hwang
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Lucidworks
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Cork AI Meetup Number 3
Cork AI Meetup Number 3Cork AI Meetup Number 3
Cork AI Meetup Number 3
Nick Grattan
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
Zubair Nabi
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
Zachary S. Brown
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...
semanticsconference
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
Simon Hughes
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Lucidworks
 
The Tipping Point
The Tipping PointThe Tipping Point
The Tipping Point
Andrzej Zydroń MBCS
 
The tipping point
The tipping pointThe tipping point
The tipping point
Andrzej Zydroń MBCS
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Mihai Criveti
 
Machine Learning & Apache Mahout
Machine Learning & Apache MahoutMachine Learning & Apache Mahout
Machine Learning & Apache Mahout
Domingo Suarez Torres
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
Dios Kurniawan
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
mestato
 
Decoder Ring
Decoder RingDecoder Ring
Decoder Ring
Jeff Beeman
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
ssuserf583ac
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
RohanBorgalli
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
SreeVani74
 

Similar to AI presentation and introduction - Retrieval Augmented Generation RAG 101 (20)

openai.pptx
openai.pptxopenai.pptx
openai.pptx
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Cork AI Meetup Number 3
Cork AI Meetup Number 3Cork AI Meetup Number 3
Cork AI Meetup Number 3
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
 
The Tipping Point
The Tipping PointThe Tipping Point
The Tipping Point
 
The tipping point
The tipping pointThe tipping point
The tipping point
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
 
Machine Learning & Apache Mahout
Machine Learning & Apache MahoutMachine Learning & Apache Mahout
Machine Learning & Apache Mahout
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
Decoder Ring
Decoder RingDecoder Ring
Decoder Ring
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
 

Recently uploaded

Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
manji sharman06
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
leebarnesutopia
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
Mydbops
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 

Recently uploaded (20)

Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 

AI presentation and introduction - Retrieval Augmented Generation RAG 101

  • 3. You said Large Language Model ? • Generative deep learning models for understanding and generating text, images and other types • A special kind : Transformers • “Attention is All you Need”, Vaswani et al. 2017 (https://arxiv.org/abs/1706.03762) • Transformers analyse chunks of data, called “tokens” and learn to predict the next token in a sequence • Prediction is a probability • Model that can generalize : one single model to address several use cases Focus on Language Models
  • 4.
  • 5. Build the model - Training What it’s like ? • Foundational models • Datasets LLM are trained using techniques that requires huge text-based datasets, e.g. “The Pile” : +880 Gb (Wikipedia, Youtube st, Github, …) “RedPajama”: +5Tb (wikipedia, StackExchange, ArXiv, …) Choosing and curating datasets for training is the secret sauce ! • Computing Power Transformer-based model have limitations: quadratic-complexity of attention mechanism Computationally intensive for long sequences
  • 6. Common patterns • Context The size of input data given to the model : size is limited ! • Prompt The question / the task, enriched with ‘pre- prompt’ • Zero-shot / Few-shot, … To give or not samples of answers expected • Temperature How much the model is imaginative Use the model - Inference
  • 7. Which Model ? Criteria to take in account for a use case • Open Source vs Commercial • Best of breed • Versioning & lifecycle • Cost e ffi ciency vs Overkill -> Size • Accuracy
  • 8. At the heart of the machine • On Premises • Compute: GPUs choice / VRAM size / Model quantization • NVIDIA T4 = 16Gb / 1100$ • NVIDIA A100 = 80Gb / 8000$ • Scalability : concurrent users, context size • Online vs batch • On Cloud • Which one ? Cost, diversity and availability • Pricing model: 1M token comes very fast ! 1 word ~ 4 tokens • Sovereignty, data privacy Infrastructure
  • 10. Aka your search engine 2.0 Very common use case = “Retrival Augmented Generation”
  • 11. RAG - 101 Search & Summarize In 4 Steps
  • 12. Step 1 - Document loading • Documents are loaded from data connectors • They are split into chunks RAG
  • 13. Step 2 - Embeddings • Chunks are 'transformed' into vectors (numbers) ✓It's the process of word embedding, using a pre-trained model ✓hundreds (even thousands !) of dimensions are required to represent the space of all words • Vectors are stored in a dedicated database (a vector database) RAG
  • 14. Step 3 - Retrieval • Previous steps were preparatory work, now comes the live part • Question is vectorized as well, used as an input for similarity search • Most relevant chunks are retrieved, i.e. vectors coordinates are close together RAG
  • 15. Step 4 - Generation •Retrieved chunks are used to feed the LLM prompt context •Question is added to the prompt •LLM reads the prompt and generates a natural language answer •During this inference time, the model requires a lot of GPU power ! RAG
  • 16. RAG engineering Lots of moving part to reach performance ! Flow / Batch Data Policy Deduplication Data cleanage Attachments (images, pdf) PII / Anonymization Data policy / criticity Chunking strategy Embedding Model Size Language Tokenizer Vector DB Choice Cloud / Local Vectors dimensions & reduction Retrieval con fi g (top_k, similarity) Re-ranking MMR score RAG techniques (Corrective, Self-re fl ective Rag-Fusion, HyDE) Chat memory Model con fi g (temperature, top_k, top_p) Model Evaluation / derivation (BLUE/RED, precision, recall, F1 score, Ragas, truelens, Human Feedback) Prompt eng. Guard rails (Hallucinations, NSFW, …) model compare / VertexSxS Performance (TTFT, TPS, …) PII / Anon (again) UI-Integration LLMOPS / MLOPS Cost Ef fi ciency