Optimizing GenAI apps, by N. El Mawass and Maria Knorps

•

0 likes•65 views

Paris Women in Machine Learning and Data Science

Nour and Maria present the work they did at Tweag, Modus Create innovation arm, where the GenAI team developed an evaluation framework for Retrieval-Augmented Generation (RAG) systems. RAG systems provide an easy and low-cost way to extend the knowledge of Large Language Models (LLMs) but measuring their performance is not an easy task. The presentation will review existing evaluation frameworks, ranging from those based on the traditional ML approach of using groundtruth datasets, including Tweag's, to those that use LLMs to compute evaluation metrics. It will also delve into the practical implementation of Tweag's chatbot over two distinct documents datasets and provide insights on chunking, embedding and how open source and commercial LLMs compare.

Data & Analytics

Optimizing GenAI apps
Evaluating
Retrieval in RAGs
Nour El Mawass, Maria Knorps
7.Feb.2024

What are RAGs?
And why do we need them anyway?
3

4
Retrieval Augmented Generation
<your question here>
● LLMs have a learning cutoﬀ
● Fine-tuning is costly
● Adding relevant context to the
prompt is cheap and easy
● Find relevant context with
semantic search

Semantic search
● Vectorizing a documents base:
○ Chunking
○ Embedding/Vectorizing
○ Indexing
● Finding documents similar to a query:
○ Vectorize query
○ Find closest vectors
5

8
<your question here>
The GenAI team at Tweag has been working on
applying the Retrieval-Augmented Generation (RAG)
paradigm together with commercial and open source
LLMs to perform intelligent search and suggestion
over a collection of Conﬂuence and Bazel documents.
The LLM processing can be carried out within a virtual
private cloud domain (AWS in this case) so that no
information is shared with third parties.

Experimenting vs "eyeballing"
10
- No benchmark: No guarantee that
the introduced change did not
degrade performance on other
questions.
- No experiments tracking: Likely
none of the intermediate states was
committed or properly tracked.
- No evaluation metrics: We cannot
numerically compare the current RAG
state to any other possible state.
- No solution space: What alternatives
are we exploring?

Evaluation's golden quartet
12
Experiments tracking
Evaluation metrics Parameters space
Benchmark

Benchmark
● Benchmark over the documents database:
○ Questions
○ Pairs of (question, answers)
○ Pairs of (question, relevant_documents)
● Not easy: need representative and varied queries
13
Human-generated LLM-generated
● Can be automated with LLMs
○ generate questions over documents
○ reformulate questions

$Parameters space 14 "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, } "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, } "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, } "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, } "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, }$

Evaluation metrics
● Information Retrieval metrics (traditional ML)
○ Labeled dataset
○ Evaluate recall and precision at k
● LLM-based evaluation
○ Context relevance
■ ratio of relevant to total sentences in
the retrieved documents
○ Context recall
15
LLM-based RAG
metrics
Information
retrieval metrics

Experiments tracking
16
Data
(benchmark + vectors)
+
Experiment's
parameters
+
Version-controlled
code
Experiment
tracking
+ +
Parameters
- k
- embedding
- chunking
- …

Tweag’s evaluation framework
17
Experiments
tracking
Evaluation
metrics
Parameters
space
Benchmark

Key strategies: Chunking and Embedding
18
● Embedding models:
○ all-MiniLM-L12-v2
○ Multi-qa-mpnet-nase-dot-v1
○ All-mpnet-base-v2
○ SpacyEmbeddings
● Chunking models
○ RecursiveCharacterTextSplitter
○ SentenceTransformersTokenTextSplitter
● Benchmarks:
○ User questions
○ User questions reformulated with ChatGPT3.5

Takeaways
● You need to evaluate your system, no eyeballing!
● Many frameworks and tools: check our blog posts for an
introduction.
20
https://www.tweag.io/group/genai/

What's hot

Cloud AI GenAI Overview.pptxSahithiGurlinka

LLM App Hacking (AVTOKYO2023)Shota Shinogi

Generative AI Use cases for Enterprise - Second SessionGene Leybzon

Best Practice on using Azure OpenAI ServiceKumton Suttiraksiri

ProductTank HK #31 - Maximizing Product Ops Efficiency with Generative AIAmanda Lam

Generative AI Use-cases for Enterprise - First SessionGene Leybzon

presentation.pdfcaa28steve

Generative AI con Amazon Bedrock.pdfGuido Maria Nebiolo

Conversational AI and Chatbot IntegrationsCristina Vidu

FendBend.AI Digital Claims Admin Fintech (Insurtech) Business Pitch @DVHacksVenkat Chandra ("VC")

Using Generative AIMark DeLoura

Large Language Models BootcampData Science Dojo

generative-ai-fundamentals and Large language modelsAdventureWorld5

What Leaders Need To KNOW & DO About Generative AI.pdfSwitch On | Thrive Your Future

APIs as a new Banking ChannelPaymentComponents

Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1

Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...patiladiti752

LanGCHAIN FrameworkKeymate.AI

Understanding GenAI/LLM and What is Google Offering - Felix GohNUS-ISS

LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostAggregage

What's hot (20)

Cloud AI GenAI Overview.pptx

LLM App Hacking (AVTOKYO2023)

Generative AI Use cases for Enterprise - Second Session

Best Practice on using Azure OpenAI Service

ProductTank HK #31 - Maximizing Product Ops Efficiency with Generative AI

Generative AI Use-cases for Enterprise - First Session

presentation.pdf

Generative AI con Amazon Bedrock.pdf

Conversational AI and Chatbot Integrations

FendBend.AI Digital Claims Admin Fintech (Insurtech) Business Pitch @DVHacks

Using Generative AI

Large Language Models Bootcamp

generative-ai-fundamentals and Large language models

What Leaders Need To KNOW & DO About Generative AI.pdf

APIs as a new Banking Channel

Unlocking the Power of Generative AI An Executive's Guide.pdf

Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...

LanGCHAIN Framework

Understanding GenAI/LLM and What is Google Offering - Felix Goh

LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost

Similar to Optimizing GenAI apps, by N. El Mawass and Maria Knorps

OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...NETWAYS

MongoDB .local London 2019: Fast Machine Learning Development with MongoDBLisa Roth, PMP

MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB

MongoDB World 2019: Fast Machine Learning Development with MongoDBMongoDB

MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 MinutesMongoDB

Elasticsearch & "PeopleSearch"George Stathis

Big Query - Women Techmarkers (Ukraine - March 2014)Ido Green

Webinar: Scaling MongoDBMongoDB

The need for sophistication in modern search engine implementationsBen DeMott

Eagle6 mongo dc revisedMongoDB

Eagle6 Enterprise Situational AwarenessMongoDB

How to Achieve Scale with MongoDBMongoDB

Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain

[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...DataScienceConferenc1

The Quest for an Open Source Data Science PlatformQAware GmbH

Big data analysis in python @ PyCon.tw 2013Jimmy Lai

When GenAI meets with Java with Quarkus and langchain4jJean-Francois James

stackconf 2022: Introduction to Vector Search with WeaviateNETWAYS

NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...Anant Corporation

2019 StartIT - Boosting your performance with BlackfireMarko Mitranić

Similar to Optimizing GenAI apps, by N. El Mawass and Maria Knorps (20)

OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...

MongoDB .local London 2019: Fast Machine Learning Development with MongoDB

MongoDB World 2019: Fast Machine Learning Development with MongoDB

MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes

Elasticsearch & "PeopleSearch"

Big Query - Women Techmarkers (Ukraine - March 2014)

Webinar: Scaling MongoDB

The need for sophistication in modern search engine implementations

Eagle6 mongo dc revised

Eagle6 Enterprise Situational Awareness

How to Achieve Scale with MongoDB

Multiplatform Spark solution for Graph datasources by Javier Dominguez

[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...

The Quest for an Open Source Data Science Platform

Big data analysis in python @ PyCon.tw 2013

When GenAI meets with Java with Quarkus and langchain4j

stackconf 2022: Introduction to Vector Search with Weaviate

NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...

2019 StartIT - Boosting your performance with Blackfire

Recently uploaded

Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics

原件一样(UWO毕业证书）西安大略大学毕业证成绩单留信学历认证pwgnohujw

Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...mikehavy0

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940

Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi

Introduction to Statistics Presentation.pptxAniqa Zai

Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh +966572737505 get cytotec

如何办理(Dalhousie毕业证书）达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk

Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Voces Mineras

Bios of leading Astrologers & Researchersdarmandersingh4580

Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969

DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSSnehalVinod

社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社

RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay

Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation

如何办理英国诺森比亚大学毕业证（NU毕业证书）成绩单原件一模一样wsppdmt

Huawei Ransomware Protection Storage Solution Technical Overview Presentation...LuisMiguelPaz5

Pentesting_AI and security challenges of AIf6x4zqzk86

Recently uploaded (20)

Predictive Precipitation: Advanced Rain Forecasting Techniques

原件一样(UWO毕业证书）西安大略大学毕业证成绩单留信学历认证

Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia

Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...

Introduction to Statistics Presentation.pptx

Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit

如何办理(Dalhousie毕业证书）达尔豪斯大学毕业证成绩单留信学历认证

Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...

Bios of leading Astrologers & Researchers

Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...

DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS

社内勉強会資料_Object Recognition as Next Token Prediction

RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx

Identify Customer Segments to Create Customer Offers for Each Segment - Appli...

如何办理英国诺森比亚大学毕业证（NU毕业证书）成绩单原件一模一样

Huawei Ransomware Protection Storage Solution Technical Overview Presentation...

Pentesting_AI and security challenges of AI

Optimizing GenAI apps, by N. El Mawass and Maria Knorps

1. Optimizing GenAI apps Evaluating Retrieval in RAGs Nour El Mawass, Maria Knorps 7.Feb.2024

2. www.tweag.io/group/genai 2

3. What are RAGs? And why do we need them anyway? 3

4. 4 Retrieval Augmented Generation <your question here> ● LLMs have a learning cutoﬀ ● Fine-tuning is costly ● Adding relevant context to the prompt is cheap and easy ● Find relevant context with semantic search

5. Semantic search ● Vectorizing a documents base: ○ Chunking ○ Embedding/Vectorizing ○ Indexing ● Finding documents similar to a query: ○ Vectorize query ○ Find closest vectors 5

6. You probably saw RAGs before 6

7. We've built a RAG too! 7

8. 8 <your question here> The GenAI team at Tweag has been working on applying the Retrieval-Augmented Generation (RAG) paradigm together with commercial and open source LLMs to perform intelligent search and suggestion over a collection of Conﬂuence and Bazel documents. The LLM processing can be carried out within a virtual private cloud domain (AWS in this case) so that no information is shared with third parties.

9. RAGs

10. Experimenting vs "eyeballing" 10 - No benchmark: No guarantee that the introduced change did not degrade performance on other questions. - No experiments tracking: Likely none of the intermediate states was committed or properly tracked. - No evaluation metrics: We cannot numerically compare the current RAG state to any other possible state. - No solution space: What alternatives are we exploring?

11. Evaluating retrieval: why? 11

12. Evaluation's golden quartet 12 Experiments tracking Evaluation metrics Parameters space Benchmark

13. Benchmark ● Benchmark over the documents database: ○ Questions ○ Pairs of (question, answers) ○ Pairs of (question, relevant_documents) ● Not easy: need representative and varied queries 13 Human-generated LLM-generated ● Can be automated with LLMs ○ generate questions over documents ○ reformulate questions

14. Parameters space 14 "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, } "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, } "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, } "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, } "retrieval": { "collection_name": "default", "embedding_model": { "name": "langchain.embeddings.SentenceTransformerEmbeddings", "parameters": { "model_name": "all-mpnet-base-v2" } }, "chunking_model": { "name": "langchain.text_splitter.RecursiveCharacterTextSplitter", "parameters": { "chunk_size": 500, "chunk_overlap": 5 } }, "top_k": 10, "preprocessing_model": { "name": "user_input_to_search_query" }, }

15. Evaluation metrics ● Information Retrieval metrics (traditional ML) ○ Labeled dataset ○ Evaluate recall and precision at k ● LLM-based evaluation ○ Context relevance ■ ratio of relevant to total sentences in the retrieved documents ○ Context recall 15 LLM-based RAG metrics Information retrieval metrics

16. Experiments tracking 16 Data (benchmark + vectors) + Experiment's parameters + Version-controlled code Experiment tracking + + Parameters - k - embedding - chunking - …

17. Tweag’s evaluation framework 17 Experiments tracking Evaluation metrics Parameters space Benchmark

18. Key strategies: Chunking and Embedding 18 ● Embedding models: ○ all-MiniLM-L12-v2 ○ Multi-qa-mpnet-nase-dot-v1 ○ All-mpnet-base-v2 ○ SpacyEmbeddings ● Chunking models ○ RecursiveCharacterTextSplitter ○ SentenceTransformersTokenTextSplitter ● Benchmarks: ○ User questions ○ User questions reformulated with ChatGPT3.5

19. Results on the Conﬂuence chatbot 19

20. Takeaways ● You need to evaluate your system, no eyeballing! ● Many frameworks and tools: check our blog posts for an introduction. 20 https://www.tweag.io/group/genai/

Optimizing GenAI apps, by N. El Mawass and Maria Knorps

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Optimizing GenAI apps, by N. El Mawass and Maria Knorps

Similar to Optimizing GenAI apps, by N. El Mawass and Maria Knorps (20)

More from Paris Women in Machine Learning and Data Science

More from Paris Women in Machine Learning and Data Science (20)

Recently uploaded

Recently uploaded (20)

Optimizing GenAI apps, by N. El Mawass and Maria Knorps