SlideShare a Scribd company logo
1 of 20
Download to read offline
Optimizing GenAI apps
Evaluating
Retrieval in RAGs
Nour El Mawass, Maria Knorps
7.Feb.2024
www.tweag.io/group/genai
2
What are RAGs?
And why do we need them anyway?
3
4
Retrieval Augmented Generation
<your question here>
● LLMs have a learning cutoff
● Fine-tuning is costly
● Adding relevant context to the
prompt is cheap and easy
● Find relevant context with
semantic search
Semantic search
● Vectorizing a documents base:
○ Chunking
○ Embedding/Vectorizing
○ Indexing
● Finding documents similar to a query:
○ Vectorize query
○ Find closest vectors
5
You probably saw RAGs before
6
We've built a RAG too!
7
8
<your question here>
The GenAI team at Tweag has been working on
applying the Retrieval-Augmented Generation (RAG)
paradigm together with commercial and open source
LLMs to perform intelligent search and suggestion
over a collection of Confluence and Bazel documents.
The LLM processing can be carried out within a virtual
private cloud domain (AWS in this case) so that no
information is shared with third parties.
RAGs
Experimenting vs "eyeballing"
10
- No benchmark: No guarantee that
the introduced change did not
degrade performance on other
questions.
- No experiments tracking: Likely
none of the intermediate states was
committed or properly tracked.
- No evaluation metrics: We cannot
numerically compare the current RAG
state to any other possible state.
- No solution space: What alternatives
are we exploring?
Evaluating retrieval: why?
11
Evaluation's golden quartet
12
Experiments tracking
Evaluation metrics Parameters space
Benchmark
Benchmark
● Benchmark over the documents database:
○ Questions
○ Pairs of (question, answers)
○ Pairs of (question, relevant_documents)
● Not easy: need representative and varied queries
13
Human-generated LLM-generated
● Can be automated with LLMs
○ generate questions over documents
○ reformulate questions
Parameters space
14
"retrieval": {
"collection_name": "default",
"embedding_model": {
"name": "langchain.embeddings.SentenceTransformerEmbeddings",
"parameters": { "model_name": "all-mpnet-base-v2" }
},
"chunking_model": {
"name": "langchain.text_splitter.RecursiveCharacterTextSplitter",
"parameters": { "chunk_size": 500, "chunk_overlap": 5 }
},
"top_k": 10,
"preprocessing_model": {
"name": "user_input_to_search_query"
},
}
"retrieval": {
"collection_name": "default",
"embedding_model": {
"name": "langchain.embeddings.SentenceTransformerEmbeddings",
"parameters": { "model_name": "all-mpnet-base-v2" }
},
"chunking_model": {
"name": "langchain.text_splitter.RecursiveCharacterTextSplitter",
"parameters": { "chunk_size": 500, "chunk_overlap": 5 }
},
"top_k": 10,
"preprocessing_model": {
"name": "user_input_to_search_query"
},
}
"retrieval": {
"collection_name": "default",
"embedding_model": {
"name": "langchain.embeddings.SentenceTransformerEmbeddings",
"parameters": { "model_name": "all-mpnet-base-v2" }
},
"chunking_model": {
"name": "langchain.text_splitter.RecursiveCharacterTextSplitter",
"parameters": { "chunk_size": 500, "chunk_overlap": 5 }
},
"top_k": 10,
"preprocessing_model": {
"name": "user_input_to_search_query"
},
}
"retrieval": {
"collection_name": "default",
"embedding_model": {
"name": "langchain.embeddings.SentenceTransformerEmbeddings",
"parameters": { "model_name": "all-mpnet-base-v2" }
},
"chunking_model": {
"name": "langchain.text_splitter.RecursiveCharacterTextSplitter",
"parameters": { "chunk_size": 500, "chunk_overlap": 5 }
},
"top_k": 10,
"preprocessing_model": {
"name": "user_input_to_search_query"
},
}
"retrieval": {
"collection_name": "default",
"embedding_model": {
"name": "langchain.embeddings.SentenceTransformerEmbeddings",
"parameters": { "model_name": "all-mpnet-base-v2" }
},
"chunking_model": {
"name": "langchain.text_splitter.RecursiveCharacterTextSplitter",
"parameters": { "chunk_size": 500, "chunk_overlap": 5 }
},
"top_k": 10,
"preprocessing_model": {
"name": "user_input_to_search_query"
},
}
Evaluation metrics
● Information Retrieval metrics (traditional ML)
○ Labeled dataset
○ Evaluate recall and precision at k
● LLM-based evaluation
○ Context relevance
■ ratio of relevant to total sentences in
the retrieved documents
○ Context recall
15
LLM-based RAG
metrics
Information
retrieval metrics
Experiments tracking
16
Data
(benchmark + vectors)
+
Experiment's
parameters
+
Version-controlled
code
Experiment
tracking
+ +
Parameters
- k
- embedding
- chunking
- …
Tweag’s evaluation framework
17
Experiments
tracking
Evaluation
metrics
Parameters
space
Benchmark
Key strategies: Chunking and Embedding
18
● Embedding models:
○ all-MiniLM-L12-v2
○ Multi-qa-mpnet-nase-dot-v1
○ All-mpnet-base-v2
○ SpacyEmbeddings
● Chunking models
○ RecursiveCharacterTextSplitter
○ SentenceTransformersTokenTextSplitter
● Benchmarks:
○ User questions
○ User questions reformulated with ChatGPT3.5
Results on the Confluence chatbot
19
Takeaways
● You need to evaluate your system, no eyeballing!
● Many frameworks and tools: check our blog posts for an
introduction.
20
https://www.tweag.io/group/genai/

More Related Content

What's hot

Cloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptxCloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptxSahithiGurlinka
 
LLM App Hacking (AVTOKYO2023)
LLM App Hacking (AVTOKYO2023)LLM App Hacking (AVTOKYO2023)
LLM App Hacking (AVTOKYO2023)Shota Shinogi
 
Generative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second SessionGenerative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second SessionGene Leybzon
 
Best Practice on using Azure OpenAI Service
Best Practice on using Azure OpenAI ServiceBest Practice on using Azure OpenAI Service
Best Practice on using Azure OpenAI ServiceKumton Suttiraksiri
 
ProductTank HK #31 - Maximizing Product Ops Efficiency with Generative AI
ProductTank HK #31 - Maximizing Product Ops Efficiency with Generative AIProductTank HK #31 - Maximizing Product Ops Efficiency with Generative AI
ProductTank HK #31 - Maximizing Product Ops Efficiency with Generative AIAmanda Lam
 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGene Leybzon
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdfcaa28steve
 
Generative AI con Amazon Bedrock.pdf
Generative AI con Amazon Bedrock.pdfGenerative AI con Amazon Bedrock.pdf
Generative AI con Amazon Bedrock.pdfGuido Maria Nebiolo
 
Conversational AI and Chatbot Integrations
Conversational AI and Chatbot IntegrationsConversational AI and Chatbot Integrations
Conversational AI and Chatbot IntegrationsCristina Vidu
 
FendBend.AI Digital Claims Admin Fintech (Insurtech) Business Pitch @DVHacks
FendBend.AI  Digital Claims Admin Fintech (Insurtech) Business Pitch @DVHacksFendBend.AI  Digital Claims Admin Fintech (Insurtech) Business Pitch @DVHacks
FendBend.AI Digital Claims Admin Fintech (Insurtech) Business Pitch @DVHacksVenkat Chandra ("VC")
 
Using Generative AI
Using Generative AIUsing Generative AI
Using Generative AIMark DeLoura
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models BootcampData Science Dojo
 
generative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsgenerative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsAdventureWorld5
 
What Leaders Need To KNOW & DO About Generative AI.pdf
What Leaders Need To KNOW & DO About Generative AI.pdfWhat Leaders Need To KNOW & DO About Generative AI.pdf
What Leaders Need To KNOW & DO About Generative AI.pdfSwitch On | Thrive Your Future
 
APIs as a new Banking Channel
APIs as a new Banking ChannelAPIs as a new Banking Channel
APIs as a new Banking ChannelPaymentComponents
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
 
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...patiladiti752
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN FrameworkKeymate.AI
 
Understanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohNUS-ISS
 
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostLLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostAggregage
 

What's hot (20)

Cloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptxCloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptx
 
LLM App Hacking (AVTOKYO2023)
LLM App Hacking (AVTOKYO2023)LLM App Hacking (AVTOKYO2023)
LLM App Hacking (AVTOKYO2023)
 
Generative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second SessionGenerative AI Use cases for Enterprise - Second Session
Generative AI Use cases for Enterprise - Second Session
 
Best Practice on using Azure OpenAI Service
Best Practice on using Azure OpenAI ServiceBest Practice on using Azure OpenAI Service
Best Practice on using Azure OpenAI Service
 
ProductTank HK #31 - Maximizing Product Ops Efficiency with Generative AI
ProductTank HK #31 - Maximizing Product Ops Efficiency with Generative AIProductTank HK #31 - Maximizing Product Ops Efficiency with Generative AI
ProductTank HK #31 - Maximizing Product Ops Efficiency with Generative AI
 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First Session
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdf
 
Generative AI con Amazon Bedrock.pdf
Generative AI con Amazon Bedrock.pdfGenerative AI con Amazon Bedrock.pdf
Generative AI con Amazon Bedrock.pdf
 
Conversational AI and Chatbot Integrations
Conversational AI and Chatbot IntegrationsConversational AI and Chatbot Integrations
Conversational AI and Chatbot Integrations
 
FendBend.AI Digital Claims Admin Fintech (Insurtech) Business Pitch @DVHacks
FendBend.AI  Digital Claims Admin Fintech (Insurtech) Business Pitch @DVHacksFendBend.AI  Digital Claims Admin Fintech (Insurtech) Business Pitch @DVHacks
FendBend.AI Digital Claims Admin Fintech (Insurtech) Business Pitch @DVHacks
 
Using Generative AI
Using Generative AIUsing Generative AI
Using Generative AI
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
generative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language modelsgenerative-ai-fundamentals and Large language models
generative-ai-fundamentals and Large language models
 
What Leaders Need To KNOW & DO About Generative AI.pdf
What Leaders Need To KNOW & DO About Generative AI.pdfWhat Leaders Need To KNOW & DO About Generative AI.pdf
What Leaders Need To KNOW & DO About Generative AI.pdf
 
APIs as a new Banking Channel
APIs as a new Banking ChannelAPIs as a new Banking Channel
APIs as a new Banking Channel
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN Framework
 
Understanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix Goh
 
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostLLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
 

Similar to Optimizing GenAI apps, by N. El Mawass and Maria Knorps

OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...NETWAYS
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBLisa Roth, PMP
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB
 
MongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB World 2019: Fast Machine Learning Development with MongoDBMongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB World 2019: Fast Machine Learning Development with MongoDBMongoDB
 
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 MinutesMongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 MinutesMongoDB
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"George Stathis
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Ido Green
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
The need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsThe need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsBen DeMott
 
Eagle6 mongo dc revised
Eagle6 mongo dc revisedEagle6 mongo dc revised
Eagle6 mongo dc revisedMongoDB
 
Eagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational AwarenessEagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational AwarenessMongoDB
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBMongoDB
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...DataScienceConferenc1
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science PlatformQAware GmbH
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Jimmy Lai
 
When GenAI meets with Java with Quarkus and langchain4j
When GenAI meets with Java with Quarkus and langchain4jWhen GenAI meets with Java with Quarkus and langchain4j
When GenAI meets with Java with Quarkus and langchain4jJean-Francois James
 
stackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviatestackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with WeaviateNETWAYS
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...Anant Corporation
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with BlackfireMarko Mitranić
 

Similar to Optimizing GenAI apps, by N. El Mawass and Maria Knorps (20)

OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
 
MongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB World 2019: Fast Machine Learning Development with MongoDBMongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB World 2019: Fast Machine Learning Development with MongoDB
 
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 MinutesMongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
The need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsThe need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementations
 
Eagle6 mongo dc revised
Eagle6 mongo dc revisedEagle6 mongo dc revised
Eagle6 mongo dc revised
 
Eagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational AwarenessEagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational Awareness
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science Platform
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 
When GenAI meets with Java with Quarkus and langchain4j
When GenAI meets with Java with Quarkus and langchain4jWhen GenAI meets with Java with Quarkus and langchain4j
When GenAI meets with Java with Quarkus and langchain4j
 
stackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviatestackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviate
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
 

More from Paris Women in Machine Learning and Data Science

More from Paris Women in Machine Learning and Data Science (20)

Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Perspectives, by M. Pannegeon
 
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
 
Iana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdfIana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdf
 

Recently uploaded

Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...mikehavy0
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Voces Mineras
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchersdarmandersingh4580
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSSnehalVinod
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...LuisMiguelPaz5
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIf6x4zqzk86
 

Recently uploaded (20)

Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AI
 

Optimizing GenAI apps, by N. El Mawass and Maria Knorps

  • 1. Optimizing GenAI apps Evaluating Retrieval in RAGs Nour El Mawass, Maria Knorps 7.Feb.2024
  • 3. What are RAGs? And why do we need them anyway? 3
  • 4. 4 Retrieval Augmented Generation <your question here> ● LLMs have a learning cutoff ● Fine-tuning is costly ● Adding relevant context to the prompt is cheap and easy ● Find relevant context with semantic search
  • 5. Semantic search ● Vectorizing a documents base: ○ Chunking ○ Embedding/Vectorizing ○ Indexing ● Finding documents similar to a query: ○ Vectorize query ○ Find closest vectors 5
  • 6. You probably saw RAGs before 6
  • 7. We've built a RAG too! 7
  • 8. 8 <your question here> The GenAI team at Tweag has been working on applying the Retrieval-Augmented Generation (RAG) paradigm together with commercial and open source LLMs to perform intelligent search and suggestion over a collection of Confluence and Bazel documents. The LLM processing can be carried out within a virtual private cloud domain (AWS in this case) so that no information is shared with third parties.
  • 10. Experimenting vs "eyeballing" 10 - No benchmark: No guarantee that the introduced change did not degrade performance on other questions. - No experiments tracking: Likely none of the intermediate states was committed or properly tracked. - No evaluation metrics: We cannot numerically compare the current RAG state to any other possible state. - No solution space: What alternatives are we exploring?
  • 12. Evaluation's golden quartet 12 Experiments tracking Evaluation metrics Parameters space Benchmark
  • 13. Benchmark ● Benchmark over the documents database: ○ Questions ○ Pairs of (question, answers) ○ Pairs of (question, relevant_documents) ● Not easy: need representative and varied queries 13 Human-generated LLM-generated ● Can be automated with LLMs ○ generate questions over documents ○ reformulate questions
  • 14. Parameters space 14 "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, } "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, } "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, } "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, } "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, }
  • 15. Evaluation metrics ● Information Retrieval metrics (traditional ML) ○ Labeled dataset ○ Evaluate recall and precision at k ● LLM-based evaluation ○ Context relevance ■ ratio of relevant to total sentences in the retrieved documents ○ Context recall 15 LLM-based RAG metrics Information retrieval metrics
  • 16. Experiments tracking 16 Data (benchmark + vectors) + Experiment's parameters + Version-controlled code Experiment tracking + + Parameters - k - embedding - chunking - …
  • 18. Key strategies: Chunking and Embedding 18 ● Embedding models: ○ all-MiniLM-L12-v2 ○ Multi-qa-mpnet-nase-dot-v1 ○ All-mpnet-base-v2 ○ SpacyEmbeddings ● Chunking models ○ RecursiveCharacterTextSplitter ○ SentenceTransformersTokenTextSplitter ● Benchmarks: ○ User questions ○ User questions reformulated with ChatGPT3.5
  • 19. Results on the Confluence chatbot 19
  • 20. Takeaways ● You need to evaluate your system, no eyeballing! ● Many frameworks and tools: check our blog posts for an introduction. 20 https://www.tweag.io/group/genai/