SlideShare a Scribd company logo
Optimizing GenAI apps
Evaluating
Retrieval in RAGs
Nour El Mawass, Maria Knorps
7.Feb.2024
www.tweag.io/group/genai
2
What are RAGs?
And why do we need them anyway?
3
4
Retrieval Augmented Generation
<your question here>
● LLMs have a learning cutoff
● Fine-tuning is costly
● Adding relevant context to the
prompt is cheap and easy
● Find relevant context with
semantic search
Semantic search
● Vectorizing a documents base:
○ Chunking
○ Embedding/Vectorizing
○ Indexing
● Finding documents similar to a query:
○ Vectorize query
○ Find closest vectors
5
You probably saw RAGs before
6
We've built a RAG too!
7
8
<your question here>
The GenAI team at Tweag has been working on
applying the Retrieval-Augmented Generation (RAG)
paradigm together with commercial and open source
LLMs to perform intelligent search and suggestion
over a collection of Confluence and Bazel documents.
The LLM processing can be carried out within a virtual
private cloud domain (AWS in this case) so that no
information is shared with third parties.
RAGs
Experimenting vs "eyeballing"
10
- No benchmark: No guarantee that
the introduced change did not
degrade performance on other
questions.
- No experiments tracking: Likely
none of the intermediate states was
committed or properly tracked.
- No evaluation metrics: We cannot
numerically compare the current RAG
state to any other possible state.
- No solution space: What alternatives
are we exploring?
Evaluating retrieval: why?
11
Evaluation's golden quartet
12
Experiments tracking
Evaluation metrics Parameters space
Benchmark
Benchmark
● Benchmark over the documents database:
○ Questions
○ Pairs of (question, answers)
○ Pairs of (question, relevant_documents)
● Not easy: need representative and varied queries
13
Human-generated LLM-generated
● Can be automated with LLMs
○ generate questions over documents
○ reformulate questions
Parameters space
14
"retrieval": {
"collection_name": "default",
"embedding_model": {
"name": "langchain.embeddings.SentenceTransformerEmbeddings",
"parameters": { "model_name": "all-mpnet-base-v2" }
},
"chunking_model": {
"name": "langchain.text_splitter.RecursiveCharacterTextSplitter",
"parameters": { "chunk_size": 500, "chunk_overlap": 5 }
},
"top_k": 10,
"preprocessing_model": {
"name": "user_input_to_search_query"
},
}
"retrieval": {
"collection_name": "default",
"embedding_model": {
"name": "langchain.embeddings.SentenceTransformerEmbeddings",
"parameters": { "model_name": "all-mpnet-base-v2" }
},
"chunking_model": {
"name": "langchain.text_splitter.RecursiveCharacterTextSplitter",
"parameters": { "chunk_size": 500, "chunk_overlap": 5 }
},
"top_k": 10,
"preprocessing_model": {
"name": "user_input_to_search_query"
},
}
"retrieval": {
"collection_name": "default",
"embedding_model": {
"name": "langchain.embeddings.SentenceTransformerEmbeddings",
"parameters": { "model_name": "all-mpnet-base-v2" }
},
"chunking_model": {
"name": "langchain.text_splitter.RecursiveCharacterTextSplitter",
"parameters": { "chunk_size": 500, "chunk_overlap": 5 }
},
"top_k": 10,
"preprocessing_model": {
"name": "user_input_to_search_query"
},
}
"retrieval": {
"collection_name": "default",
"embedding_model": {
"name": "langchain.embeddings.SentenceTransformerEmbeddings",
"parameters": { "model_name": "all-mpnet-base-v2" }
},
"chunking_model": {
"name": "langchain.text_splitter.RecursiveCharacterTextSplitter",
"parameters": { "chunk_size": 500, "chunk_overlap": 5 }
},
"top_k": 10,
"preprocessing_model": {
"name": "user_input_to_search_query"
},
}
"retrieval": {
"collection_name": "default",
"embedding_model": {
"name": "langchain.embeddings.SentenceTransformerEmbeddings",
"parameters": { "model_name": "all-mpnet-base-v2" }
},
"chunking_model": {
"name": "langchain.text_splitter.RecursiveCharacterTextSplitter",
"parameters": { "chunk_size": 500, "chunk_overlap": 5 }
},
"top_k": 10,
"preprocessing_model": {
"name": "user_input_to_search_query"
},
}
Evaluation metrics
● Information Retrieval metrics (traditional ML)
○ Labeled dataset
○ Evaluate recall and precision at k
● LLM-based evaluation
○ Context relevance
■ ratio of relevant to total sentences in
the retrieved documents
○ Context recall
15
LLM-based RAG
metrics
Information
retrieval metrics
Experiments tracking
16
Data
(benchmark + vectors)
+
Experiment's
parameters
+
Version-controlled
code
Experiment
tracking
+ +
Parameters
- k
- embedding
- chunking
- …
Tweag’s evaluation framework
17
Experiments
tracking
Evaluation
metrics
Parameters
space
Benchmark
Key strategies: Chunking and Embedding
18
● Embedding models:
○ all-MiniLM-L12-v2
○ Multi-qa-mpnet-nase-dot-v1
○ All-mpnet-base-v2
○ SpacyEmbeddings
● Chunking models
○ RecursiveCharacterTextSplitter
○ SentenceTransformersTokenTextSplitter
● Benchmarks:
○ User questions
○ User questions reformulated with ChatGPT3.5
Results on the Confluence chatbot
19
Takeaways
● You need to evaluate your system, no eyeballing!
● Many frameworks and tools: check our blog posts for an
introduction.
20
https://www.tweag.io/group/genai/

More Related Content

What's hot

Cavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AICavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scale
Maxim Salnikov
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
Jesus Rodriguez
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
taozen
 
H2O.ai's Driverless AI
H2O.ai's Driverless AIH2O.ai's Driverless AI
H2O.ai's Driverless AI
Sri Ambati
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
PremNaraindas1
 
Understanding generative AI models A comprehensive overview.pdf
Understanding generative AI models A comprehensive overview.pdfUnderstanding generative AI models A comprehensive overview.pdf
Understanding generative AI models A comprehensive overview.pdf
StephenAmell4
 
Generative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second SessionGenerative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second Session
Gene Leybzon
 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practices
DianaGray10
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
Loic Merckel
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
Ivo Andreev
 
AzureOpenAI.pptx
AzureOpenAI.pptxAzureOpenAI.pptx
AzureOpenAI.pptx
Udaiappa Ramachandran
 
Generative AI
Generative AIGenerative AI
Generative AI
Carlos J. Costa
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
Andre Muscat
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
ssuser4edc93
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
Fiza987241
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN Framework
Keymate.AI
 
Microsoft AI Platform Overview
Microsoft AI Platform OverviewMicrosoft AI Platform Overview
Microsoft AI Platform Overview
David Chou
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AI
Benjaminlapid1
 

What's hot (20)

Cavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AICavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AI
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scale
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
 
H2O.ai's Driverless AI
H2O.ai's Driverless AIH2O.ai's Driverless AI
H2O.ai's Driverless AI
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
Understanding generative AI models A comprehensive overview.pdf
Understanding generative AI models A comprehensive overview.pdfUnderstanding generative AI models A comprehensive overview.pdf
Understanding generative AI models A comprehensive overview.pdf
 
Generative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second SessionGenerative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second Session
 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practices
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
 
AzureOpenAI.pptx
AzureOpenAI.pptxAzureOpenAI.pptx
AzureOpenAI.pptx
 
Generative AI
Generative AIGenerative AI
Generative AI
 
ChatGPT, Generative AI and Microsoft Copilot: Step Into the Future - Geoff Ab...
ChatGPT, Generative AI and Microsoft Copilot: Step Into the Future - Geoff Ab...ChatGPT, Generative AI and Microsoft Copilot: Step Into the Future - Geoff Ab...
ChatGPT, Generative AI and Microsoft Copilot: Step Into the Future - Geoff Ab...
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN Framework
 
Microsoft AI Platform Overview
Microsoft AI Platform OverviewMicrosoft AI Platform Overview
Microsoft AI Platform Overview
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AI
 

Similar to Optimizing GenAI apps, by N. El Mawass and Maria Knorps

OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
NETWAYS
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
Lisa Roth, PMP
 
MongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB World 2019: Fast Machine Learning Development with MongoDBMongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB
 
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 MinutesMongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
George Stathis
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)
Ido Green
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
MongoDB
 
The need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsThe need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementations
Ben DeMott
 
Eagle6 mongo dc revised
Eagle6 mongo dc revisedEagle6 mongo dc revised
Eagle6 mongo dc revisedMongoDB
 
Eagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational AwarenessEagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational Awareness
MongoDB
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
MongoDB
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
 
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
DataScienceConferenc1
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science Platform
QAware GmbH
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
Jimmy Lai
 
When GenAI meets with Java with Quarkus and langchain4j
When GenAI meets with Java with Quarkus and langchain4jWhen GenAI meets with Java with Quarkus and langchain4j
When GenAI meets with Java with Quarkus and langchain4j
Jean-Francois James
 
stackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviatestackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviate
NETWAYS
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
Anant Corporation
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
Marko Mitranić
 

Similar to Optimizing GenAI apps, by N. El Mawass and Maria Knorps (20)

OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
 
MongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB World 2019: Fast Machine Learning Development with MongoDBMongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB World 2019: Fast Machine Learning Development with MongoDB
 
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 MinutesMongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
The need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsThe need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementations
 
Eagle6 mongo dc revised
Eagle6 mongo dc revisedEagle6 mongo dc revised
Eagle6 mongo dc revised
 
Eagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational AwarenessEagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational Awareness
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science Platform
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 
When GenAI meets with Java with Quarkus and langchain4j
When GenAI meets with Java with Quarkus and langchain4jWhen GenAI meets with Java with Quarkus and langchain4j
When GenAI meets with Java with Quarkus and langchain4j
 
stackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviatestackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviate
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
 

More from Paris Women in Machine Learning and Data Science

Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
Paris Women in Machine Learning and Data Science
 
How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
Paris Women in Machine Learning and Data Science
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
Paris Women in Machine Learning and Data Science
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
Paris Women in Machine Learning and Data Science
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
Paris Women in Machine Learning and Data Science
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Paris Women in Machine Learning and Data Science
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
Paris Women in Machine Learning and Data Science
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Paris Women in Machine Learning and Data Science
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
Paris Women in Machine Learning and Data Science
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
Paris Women in Machine Learning and Data Science
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Paris Women in Machine Learning and Data Science
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
Paris Women in Machine Learning and Data Science
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Paris Women in Machine Learning and Data Science
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
Paris Women in Machine Learning and Data Science
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Paris Women in Machine Learning and Data Science
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
Paris Women in Machine Learning and Data Science
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Paris Women in Machine Learning and Data Science
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Paris Women in Machine Learning and Data Science
 
Iana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdfIana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdf
Paris Women in Machine Learning and Data Science
 

More from Paris Women in Machine Learning and Data Science (20)

Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Perspectives, by M. Pannegeon
 
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
 
Iana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdfIana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdf
 

Recently uploaded

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 

Recently uploaded (20)

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 

Optimizing GenAI apps, by N. El Mawass and Maria Knorps

  • 1. Optimizing GenAI apps Evaluating Retrieval in RAGs Nour El Mawass, Maria Knorps 7.Feb.2024
  • 3. What are RAGs? And why do we need them anyway? 3
  • 4. 4 Retrieval Augmented Generation <your question here> ● LLMs have a learning cutoff ● Fine-tuning is costly ● Adding relevant context to the prompt is cheap and easy ● Find relevant context with semantic search
  • 5. Semantic search ● Vectorizing a documents base: ○ Chunking ○ Embedding/Vectorizing ○ Indexing ● Finding documents similar to a query: ○ Vectorize query ○ Find closest vectors 5
  • 6. You probably saw RAGs before 6
  • 7. We've built a RAG too! 7
  • 8. 8 <your question here> The GenAI team at Tweag has been working on applying the Retrieval-Augmented Generation (RAG) paradigm together with commercial and open source LLMs to perform intelligent search and suggestion over a collection of Confluence and Bazel documents. The LLM processing can be carried out within a virtual private cloud domain (AWS in this case) so that no information is shared with third parties.
  • 10. Experimenting vs "eyeballing" 10 - No benchmark: No guarantee that the introduced change did not degrade performance on other questions. - No experiments tracking: Likely none of the intermediate states was committed or properly tracked. - No evaluation metrics: We cannot numerically compare the current RAG state to any other possible state. - No solution space: What alternatives are we exploring?
  • 12. Evaluation's golden quartet 12 Experiments tracking Evaluation metrics Parameters space Benchmark
  • 13. Benchmark ● Benchmark over the documents database: ○ Questions ○ Pairs of (question, answers) ○ Pairs of (question, relevant_documents) ● Not easy: need representative and varied queries 13 Human-generated LLM-generated ● Can be automated with LLMs ○ generate questions over documents ○ reformulate questions
  • 14. Parameters space 14 "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, } "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, } "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, } "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, } "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, }
  • 15. Evaluation metrics ● Information Retrieval metrics (traditional ML) ○ Labeled dataset ○ Evaluate recall and precision at k ● LLM-based evaluation ○ Context relevance ■ ratio of relevant to total sentences in the retrieved documents ○ Context recall 15 LLM-based RAG metrics Information retrieval metrics
  • 16. Experiments tracking 16 Data (benchmark + vectors) + Experiment's parameters + Version-controlled code Experiment tracking + + Parameters - k - embedding - chunking - …
  • 18. Key strategies: Chunking and Embedding 18 ● Embedding models: ○ all-MiniLM-L12-v2 ○ Multi-qa-mpnet-nase-dot-v1 ○ All-mpnet-base-v2 ○ SpacyEmbeddings ● Chunking models ○ RecursiveCharacterTextSplitter ○ SentenceTransformersTokenTextSplitter ● Benchmarks: ○ User questions ○ User questions reformulated with ChatGPT3.5
  • 19. Results on the Confluence chatbot 19
  • 20. Takeaways ● You need to evaluate your system, no eyeballing! ● Many frameworks and tools: check our blog posts for an introduction. 20 https://www.tweag.io/group/genai/