SlideShare a Scribd company logo
Gen AI meetup
Technology
You said Large Language Model ?
• Generative deep learning models for
understanding and generating text, images
and other types
• A special kind : Transformers
• “Attention is All you Need”, Vaswani et al.
2017 (https://arxiv.org/abs/1706.03762)
• Transformers analyse chunks of data, called
“tokens” and learn to predict the next token
in a sequence
• Prediction is a probability
• Model that can generalize : one single model
to address several use cases
Focus on Language Models
Build the model - Training
What it’s like ?
• Foundational models
• Datasets
LLM are trained using techniques that requires huge text-based datasets, e.g.
“The Pile” : +880 Gb (Wikipedia, Youtube st, Github, …)
“RedPajama”: +5Tb (wikipedia, StackExchange, ArXiv, …)
Choosing and curating datasets for training is the secret sauce !
• Computing Power
Transformer-based model have limitations: quadratic-complexity of attention mechanism
Computationally intensive for long sequences
Common patterns
• Context
The size of input data given to the model :
size is limited !
• Prompt
The question / the task, enriched with ‘pre-
prompt’
• Zero-shot / Few-shot, …
To give or not samples of answers expected
• Temperature
How much the model is imaginative
Use the model - Inference
Which Model ?
Criteria to take in account for a use case
• Open Source vs Commercial
• Best of breed
• Versioning & lifecycle
• Cost e
ffi
ciency vs Overkill -> Size
• Accuracy
At the heart of the machine
• On Premises
• Compute: GPUs choice / VRAM size / Model
quantization
• NVIDIA T4 = 16Gb / 1100$
• NVIDIA A100 = 80Gb / 8000$
• Scalability : concurrent users, context size
• Online vs batch
• On Cloud
• Which one ? Cost, diversity and availability
• Pricing model: 1M token comes very fast ! 1 word ~ 4
tokens
• Sovereignty, data privacy
Infrastructure
Real-world usage
Aka your search engine 2.0
Very common use case =
“Retrival Augmented Generation”
RAG - 101
Search & Summarize In 4 Steps
Step 1 - Document loading
• Documents are loaded from data
connectors
• They are split into chunks
RAG
Step 2 - Embeddings
• Chunks are 'transformed' into
vectors (numbers)
✓It's the process of word
embedding, using a pre-trained
model
✓hundreds (even thousands !) of
dimensions are required to
represent the space of all words
• Vectors are stored in a dedicated
database (a vector database)
RAG
Step 3 - Retrieval
• Previous steps were preparatory
work, now comes the live part
• Question is vectorized as well,
used as an input for similarity
search
• Most relevant chunks are
retrieved, i.e. vectors coordinates
are close together
RAG
Step 4 - Generation
•Retrieved chunks are used to feed
the LLM prompt context
•Question is added to the prompt
•LLM reads the prompt and
generates a natural language
answer
•During this inference time,
the model requires a lot of GPU
power !
RAG
RAG engineering
Lots of moving part to reach performance !
Flow / Batch
Data Policy
Deduplication
Data cleanage
Attachments (images, pdf)
PII / Anonymization
Data policy / criticity
Chunking strategy
Embedding Model
Size
Language
Tokenizer
Vector DB Choice
Cloud / Local
Vectors dimensions
& reduction
Retrieval con
fi
g
(top_k, similarity)
Re-ranking
MMR score
RAG techniques
(Corrective, Self-re
fl
ective
Rag-Fusion, HyDE)
Chat memory
Model con
fi
g
(temperature, top_k, top_p)
Model Evaluation / derivation
(BLUE/RED, precision,
recall, F1 score, Ragas, truelens,
Human Feedback)
Prompt eng.
Guard rails
(Hallucinations, NSFW, …)
model compare / VertexSxS
Performance (TTFT, TPS, …)
PII / Anon (again)
UI-Integration
LLMOPS / MLOPS
Cost Ef
fi
ciency
Fine Tuning ?
OpenAI’s strategy
Demo time !

More Related Content

Similar to AI presentation and introduction - Retrieval Augmented Generation RAG 101

openai.pptx
openai.pptxopenai.pptx
openai.pptx
Dori Waldman
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
Minha Hwang
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Lucidworks
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Cork AI Meetup Number 3
Cork AI Meetup Number 3Cork AI Meetup Number 3
Cork AI Meetup Number 3
Nick Grattan
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
Zubair Nabi
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
Zachary S. Brown
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...
semanticsconference
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
Simon Hughes
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Lucidworks
 
The Tipping Point
The Tipping PointThe Tipping Point
The Tipping Point
Andrzej Zydroń MBCS
 
The tipping point
The tipping pointThe tipping point
The tipping point
Andrzej Zydroń MBCS
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Mihai Criveti
 
Machine Learning & Apache Mahout
Machine Learning & Apache MahoutMachine Learning & Apache Mahout
Machine Learning & Apache Mahout
Domingo Suarez Torres
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
Dios Kurniawan
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
mestato
 
Decoder Ring
Decoder RingDecoder Ring
Decoder Ring
Jeff Beeman
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
ssuserf583ac
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
RohanBorgalli
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
SreeVani74
 

Similar to AI presentation and introduction - Retrieval Augmented Generation RAG 101 (20)

openai.pptx
openai.pptxopenai.pptx
openai.pptx
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Cork AI Meetup Number 3
Cork AI Meetup Number 3Cork AI Meetup Number 3
Cork AI Meetup Number 3
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
 
The Tipping Point
The Tipping PointThe Tipping Point
The Tipping Point
 
The tipping point
The tipping pointThe tipping point
The tipping point
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
 
Machine Learning & Apache Mahout
Machine Learning & Apache MahoutMachine Learning & Apache Mahout
Machine Learning & Apache Mahout
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
Decoder Ring
Decoder RingDecoder Ring
Decoder Ring
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
 

Recently uploaded

Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 

Recently uploaded (20)

Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 

AI presentation and introduction - Retrieval Augmented Generation RAG 101

  • 3. You said Large Language Model ? • Generative deep learning models for understanding and generating text, images and other types • A special kind : Transformers • “Attention is All you Need”, Vaswani et al. 2017 (https://arxiv.org/abs/1706.03762) • Transformers analyse chunks of data, called “tokens” and learn to predict the next token in a sequence • Prediction is a probability • Model that can generalize : one single model to address several use cases Focus on Language Models
  • 4.
  • 5. Build the model - Training What it’s like ? • Foundational models • Datasets LLM are trained using techniques that requires huge text-based datasets, e.g. “The Pile” : +880 Gb (Wikipedia, Youtube st, Github, …) “RedPajama”: +5Tb (wikipedia, StackExchange, ArXiv, …) Choosing and curating datasets for training is the secret sauce ! • Computing Power Transformer-based model have limitations: quadratic-complexity of attention mechanism Computationally intensive for long sequences
  • 6. Common patterns • Context The size of input data given to the model : size is limited ! • Prompt The question / the task, enriched with ‘pre- prompt’ • Zero-shot / Few-shot, … To give or not samples of answers expected • Temperature How much the model is imaginative Use the model - Inference
  • 7. Which Model ? Criteria to take in account for a use case • Open Source vs Commercial • Best of breed • Versioning & lifecycle • Cost e ffi ciency vs Overkill -> Size • Accuracy
  • 8. At the heart of the machine • On Premises • Compute: GPUs choice / VRAM size / Model quantization • NVIDIA T4 = 16Gb / 1100$ • NVIDIA A100 = 80Gb / 8000$ • Scalability : concurrent users, context size • Online vs batch • On Cloud • Which one ? Cost, diversity and availability • Pricing model: 1M token comes very fast ! 1 word ~ 4 tokens • Sovereignty, data privacy Infrastructure
  • 10. Aka your search engine 2.0 Very common use case = “Retrival Augmented Generation”
  • 11. RAG - 101 Search & Summarize In 4 Steps
  • 12. Step 1 - Document loading • Documents are loaded from data connectors • They are split into chunks RAG
  • 13. Step 2 - Embeddings • Chunks are 'transformed' into vectors (numbers) ✓It's the process of word embedding, using a pre-trained model ✓hundreds (even thousands !) of dimensions are required to represent the space of all words • Vectors are stored in a dedicated database (a vector database) RAG
  • 14. Step 3 - Retrieval • Previous steps were preparatory work, now comes the live part • Question is vectorized as well, used as an input for similarity search • Most relevant chunks are retrieved, i.e. vectors coordinates are close together RAG
  • 15. Step 4 - Generation •Retrieved chunks are used to feed the LLM prompt context •Question is added to the prompt •LLM reads the prompt and generates a natural language answer •During this inference time, the model requires a lot of GPU power ! RAG
  • 16. RAG engineering Lots of moving part to reach performance ! Flow / Batch Data Policy Deduplication Data cleanage Attachments (images, pdf) PII / Anonymization Data policy / criticity Chunking strategy Embedding Model Size Language Tokenizer Vector DB Choice Cloud / Local Vectors dimensions & reduction Retrieval con fi g (top_k, similarity) Re-ranking MMR score RAG techniques (Corrective, Self-re fl ective Rag-Fusion, HyDE) Chat memory Model con fi g (temperature, top_k, top_p) Model Evaluation / derivation (BLUE/RED, precision, recall, F1 score, Ragas, truelens, Human Feedback) Prompt eng. Guard rails (Hallucinations, NSFW, …) model compare / VertexSxS Performance (TTFT, TPS, …) PII / Anon (again) UI-Integration LLMOPS / MLOPS Cost Ef fi ciency