SlideShare a Scribd company logo
1 of 30
H2O.ai Confidential
JERRY LIU
CEO, LlamaIndex
H2O.ai Confidential
GenAI - Enterprise Use-cases
Document Processing
Tagging & Extraction
Knowledge Search & QA
Conversational Agent Workflow Automation
Agent: …
Human: …
Agent: …
Document
Topic:
Summary:
Author:
Knowledge Base
Answer:
Sources: …
Workflow:
● Read latest messages from user A
● Send email suggesting next-steps
Inbox
read
Email
write
H2O.ai Confidential
GenAI - Enterprise Use-cases
Document Processing
Tagging & Extraction
Knowledge Search & QA
Conversational Agent Workflow Automation
Agent: …
Human: …
Agent: …
Document
Topic:
Summary:
Author:
Knowledge
Base
Answer:
Sources: …
Workflow:
● Read latest messages
from user A
● Send email suggesting
next-steps
Inbox
read
Email
write
H2O.ai Confidential
RAG Stack
H2O.ai Confidential
Current RAG Stack for building a QA System
Vector Database
Doc
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
LLM
Data Ingestion Data Querying (Retrieval + Synthesis)
5 Lines of Code in LlamaIndex!
H2O.ai Confidential
Challenges with “Naive” RAG
H2O.ai Confidential
Challenges with Naive RAG (Response Quality)
● Bad Retrieval
○ Low Precision: Not all chunks in retrieved set are relevant
■ Hallucination + Lost in the Middle Problems
○ Low Recall: Now all relevant chunks are retrieved.
■ Lacks enough context for LLM to synthesize an answer
○ Outdated information: The data is redundant or out of date.
H2O.ai Confidential
Challenges with Naive RAG (Response Quality)
● Bad Retrieval
○ Low Precision: Not all chunks in retrieved set are relevant
■ Hallucination + Lost in the Middle Problems
○ Low Recall: Now all relevant chunks are retrieved.
■ Lacks enough context for LLM to synthesize an answer
○ Outdated information: The data is redundant or out of date.
● Bad Response Generation
○ Hallucination: Model makes up an answer that isn’t in the context.
○ Irrelevance: Model makes up an answer that doesn’t answer the question.
○ Toxicity/Bias: Model makes up an answer that’s harmful/offensive.
H2O.ai Confidential
What do we do?
• Data: Can we store additional information beyond raw text chunks?
• Embeddings: Can we optimize our embedding representations?
• Retrieval: Can we do better than top-k embedding lookup?
• Synthesis: Can we use LLMs for more than generation?
Vector
Database
Doc
Chunk
Chunk
Chunk
Chunk
Chunk LLM
Data Embeddings Retrieval Synthesis
H2O.ai Confidential
What do we do?
• Data: Can we store additional information beyond raw text chunks?
• Embeddings: Can we optimize our embedding representations?
• Retrieval: Can we do better than top-k embedding lookup?
• Synthesis: Can we use LLMs for more than generation?
But before all this…
We need evals
H2O.ai Confidential
Evaluation
H2O.ai Confidential
Evaluation
● How do we properly evaluate a RAG system?
○ Evaluate in isolation (retrieval, synthesis)
○ Evaluate e2e
Vector
Database
Chunk
Chunk
Chunk
LLM
Retrieval Synthesis
H2O.ai Confidential
Evaluation in Isolation (Retrieval)
● Evaluate quality of retrieved
chunks given user query
● Create dataset
○ Input: query
○ Output: the “ground-truth”
documents relevant to the
query
● Run retriever over dataset
● Measure ranking metrics
○ Success rate / hit-rate
○ MRR
○ Hit-rate
H2O.ai Confidential
Evaluation E2E
● Evaluation of final generated
response given input
● Create Dataset
○ Input: query
○ [Optional] Output: the
“ground-truth” answer
● Run through full RAG pipeline
● Collect evaluation metrics:
○ If no labels: label-free evals
○ If labels: with-label evals
H2O.ai Confidential
Optimizing RAG Systems
H2O.ai Confidential
From Simple to Advanced
Less Expressive
Easier to Implement
Lower Latency/Cost
More Expressive
Harder to Implement
Higher Latency/Cost
Table Stakes
Better Parsers
Chunk Sizes
Prompt Engineering
Customizing Models
🛠️
Advanced Retrieval
Metadata Filtering
Recursive Retrieval
Embedded Tables
Small-to-big Retrieval
🔎
Agentic Behavior
Routing
Query Planning
Multi-document Agents
️
Fine-tuning
Embedding fine-tuning
LLM fine-tuning
⚙️
H2O.ai Confidential
Table Stakes: Chunk Sizes
Tuning your chunk size can have outsized impacts on performance
Not obvious that more retrieved tokens == higher performance!
Note: Reranking (shuffling context order) isn’t always beneficial.
H2O.ai Confidential
Table Stakes: Prompt Engineering
RAG uses core Question-Answering (QA) prompt templates
Ways you can customize:
• Adding few-shot examples
• Modifying template text
• Adding emotions
H2O.ai Confidential
Table Stakes:
Customizing LLMs
Task performance on easy-
to-hard tasks (RAG, agents)
varies wildly among LLMs
H2O.ai Confidential
Table Stakes: Customizing Embeddings
Your embedding model + reranker affects retrieval quality
Source: https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83
H2O.ai Confidential
Advanced Retrieval: Small-to-Big
Intuition: Embedding a
big text chunk feels
suboptimal.
Solution: Embed text
at the sentence-level -
then expand that
window during LLM
synthesis
H2O.ai Confidential
Advanced Retrieval: Small-to-Big
Sentence Window Retrieval (k=2)
Naive Retrieval (k=5)
Only one out of the 5 chunks is relevant
- “lost in the middle” problem
Leads to more precise
retrieval.
Avoids “lost in the
middle” problems.
H2O.ai Confidential
Advanced Retrieval:
Small-to-Big
Intuition: Embedding a big text chunk feels suboptimal.
Solution: Embed a smaller reference to the parent
chunk. Use parent chunk for synthesis
Examples: Smaller chunks, summaries, metadata
H2O.ai Confidential
Data Agents - LLM-powered knowledge workers
Email
Read latest
emails
Knowledge
Base
Retrieve
context
Analysis
Agent
Analyze
file
Slack
Send
update
Data
Agent
H2O.ai Confidential
Data Agents - Core Components
Agent Reasoning Loop
● ReAct Agent (any LLM)
● OpenAI Agent (only OAI)
Tools via LlamaHub
● Code interpreter
● Slack
● Notion
● Zapier
● … (15+ tools, ~100 loaders)
H2O.ai Confidential
Agentic Behavior: Multi-Document Agents
Intuition: There’s certain
questions that “top-k” RAG can’t
answer.
Solution: Multi-Document
Agents
● Fact-based QA and
Summarization over any
subsets of documents
● Chain-of-thought and
query planning.
H2O.ai Confidential
Fine-Tuning: Embeddings
Credits: Jo Bergum, vespa.ai
Intuition: Embedding Representations are not optimized over your dataset
Solution: Generate a synthetic query dataset from raw text chunks using LLMs
Use this synthetic dataset to finetune an embedding model.
H2O.ai Confidential
Fine-Tuning: LLMs
Intuition: Weaker LLMs are
not bad at response
synthesis, reasoning,
structured outputs, etc.
Solution: Generate a
synthetic dataset from raw
chunks (e.g. using GPT-4).
Help fix all of the above!
H2O.ai Confidential
Resources
Production RAG
https://docs.llamaindex.ai/en/stabl
e/end_to_end_tutorials/dev_practi
ces/production_rag.html
Fine-tuning
https://docs.llamaindex.ai/en/stabl
e/end_to_end_tutorials/finetuning.
html
H2O.ai Confidential
Thanks!

More Related Content

What's hot

Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Mihai Criveti
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAnant Corporation
 
Mother of Language`s Langchain
Mother of Language`s LangchainMother of Language`s Langchain
Mother of Language`s LangchainJun-hang Lee
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdfcaa28steve
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AIBenjaminlapid1
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMsLoic Merckel
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPTLoic Merckel
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
 
Generative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveGenerative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveHuahai Yang
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models BootcampData Science Dojo
 
Prompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowaniaPrompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowaniaMichal Jaskolski
 
LLMs_talk_March23.pdf
LLMs_talk_March23.pdfLLMs_talk_March23.pdf
LLMs_talk_March23.pdfChaoYang81
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN FrameworkKeymate.AI
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
 
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented GenerationDataScienceConferenc1
 
Let's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersSteven Van Vaerenbergh
 
Benchmark comparison of Large Language Models
Benchmark comparison of Large Language ModelsBenchmark comparison of Large Language Models
Benchmark comparison of Large Language ModelsMatej Varga
 
Using Generative AI
Using Generative AIUsing Generative AI
Using Generative AIMark DeLoura
 

What's hot (20)

Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
 
Mother of Language`s Langchain
Mother of Language`s LangchainMother of Language`s Langchain
Mother of Language`s Langchain
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdf
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AI
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPT
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Generative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveGenerative AI: Past, Present, and Future – A Practitioner's Perspective
Generative AI: Past, Present, and Future – A Practitioner's Perspective
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
Prompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowaniaPrompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowania
 
LLMs_talk_March23.pdf
LLMs_talk_March23.pdfLLMs_talk_March23.pdf
LLMs_talk_March23.pdf
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN Framework
 
Webinar on ChatGPT.pptx
Webinar on ChatGPT.pptxWebinar on ChatGPT.pptx
Webinar on ChatGPT.pptx
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
 
Let's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchers
 
Benchmark comparison of Large Language Models
Benchmark comparison of Large Language ModelsBenchmark comparison of Large Language Models
Benchmark comparison of Large Language Models
 
Using Generative AI
Using Generative AIUsing Generative AI
Using Generative AI
 

Similar to Building, Evaluating, and Optimizing your RAG App for Production

Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...Lucidworks
 
Evolving s3 story
Evolving s3 storyEvolving s3 story
Evolving s3 storyAvi Perez
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caringsporst
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
Use Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfM Waleed Kadous
 
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...Amazon Web Services
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataHostedbyConfluent
 
JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31 JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31 Omnilogy
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfAltinity Ltd
 
MongoDB What's new in 3.2 version
MongoDB What's new in 3.2 versionMongoDB What's new in 3.2 version
MongoDB What's new in 3.2 versionHéliot PERROQUIN
 
Navigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePointNavigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePointJoanne Klein
 
Build Data Driven Apps with Real-time and Offline Capabilities
Build Data Driven Apps with Real-time and Offline CapabilitiesBuild Data Driven Apps with Real-time and Offline Capabilities
Build Data Driven Apps with Real-time and Offline CapabilitiesAmazon Web Services
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with BlackfireMarko Mitranić
 
Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1Henry S
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At CraigslistJeremy Zawodny
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyAndreas Grabner
 
Framing the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQLFraming the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQLInside Analysis
 
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013Amazon Web Services
 

Similar to Building, Evaluating, and Optimizing your RAG App for Production (20)

Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
 
Evolving s3 story
Evolving s3 storyEvolving s3 story
Evolving s3 story
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caring
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
Use Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdf
 
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
 
[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31 JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
 
MongoDB What's new in 3.2 version
MongoDB What's new in 3.2 versionMongoDB What's new in 3.2 version
MongoDB What's new in 3.2 version
 
Navigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePointNavigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePoint
 
Build Data Driven Apps with Real-time and Offline Capabilities
Build Data Driven Apps with Real-time and Offline CapabilitiesBuild Data Driven Apps with Real-time and Offline Capabilities
Build Data Driven Apps with Real-time and Offline Capabilities
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
 
Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
 
Framing the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQLFraming the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQL
 
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
 

More from Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFSri Ambati
 
Scaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsScaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsSri Ambati
 
Automatic Model Documentation with H2O
Automatic Model Documentation with H2OAutomatic Model Documentation with H2O
Automatic Model Documentation with H2OSri Ambati
 

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
 
Scaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsScaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOps
 
Automatic Model Documentation with H2O
Automatic Model Documentation with H2OAutomatic Model Documentation with H2O
Automatic Model Documentation with H2O
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 

Building, Evaluating, and Optimizing your RAG App for Production

  • 2. H2O.ai Confidential GenAI - Enterprise Use-cases Document Processing Tagging & Extraction Knowledge Search & QA Conversational Agent Workflow Automation Agent: … Human: … Agent: … Document Topic: Summary: Author: Knowledge Base Answer: Sources: … Workflow: ● Read latest messages from user A ● Send email suggesting next-steps Inbox read Email write
  • 3. H2O.ai Confidential GenAI - Enterprise Use-cases Document Processing Tagging & Extraction Knowledge Search & QA Conversational Agent Workflow Automation Agent: … Human: … Agent: … Document Topic: Summary: Author: Knowledge Base Answer: Sources: … Workflow: ● Read latest messages from user A ● Send email suggesting next-steps Inbox read Email write
  • 5. H2O.ai Confidential Current RAG Stack for building a QA System Vector Database Doc Chunk Chunk Chunk Chunk Chunk Chunk Chunk LLM Data Ingestion Data Querying (Retrieval + Synthesis) 5 Lines of Code in LlamaIndex!
  • 7. H2O.ai Confidential Challenges with Naive RAG (Response Quality) ● Bad Retrieval ○ Low Precision: Not all chunks in retrieved set are relevant ■ Hallucination + Lost in the Middle Problems ○ Low Recall: Now all relevant chunks are retrieved. ■ Lacks enough context for LLM to synthesize an answer ○ Outdated information: The data is redundant or out of date.
  • 8. H2O.ai Confidential Challenges with Naive RAG (Response Quality) ● Bad Retrieval ○ Low Precision: Not all chunks in retrieved set are relevant ■ Hallucination + Lost in the Middle Problems ○ Low Recall: Now all relevant chunks are retrieved. ■ Lacks enough context for LLM to synthesize an answer ○ Outdated information: The data is redundant or out of date. ● Bad Response Generation ○ Hallucination: Model makes up an answer that isn’t in the context. ○ Irrelevance: Model makes up an answer that doesn’t answer the question. ○ Toxicity/Bias: Model makes up an answer that’s harmful/offensive.
  • 9. H2O.ai Confidential What do we do? • Data: Can we store additional information beyond raw text chunks? • Embeddings: Can we optimize our embedding representations? • Retrieval: Can we do better than top-k embedding lookup? • Synthesis: Can we use LLMs for more than generation? Vector Database Doc Chunk Chunk Chunk Chunk Chunk LLM Data Embeddings Retrieval Synthesis
  • 10. H2O.ai Confidential What do we do? • Data: Can we store additional information beyond raw text chunks? • Embeddings: Can we optimize our embedding representations? • Retrieval: Can we do better than top-k embedding lookup? • Synthesis: Can we use LLMs for more than generation? But before all this… We need evals
  • 12. H2O.ai Confidential Evaluation ● How do we properly evaluate a RAG system? ○ Evaluate in isolation (retrieval, synthesis) ○ Evaluate e2e Vector Database Chunk Chunk Chunk LLM Retrieval Synthesis
  • 13. H2O.ai Confidential Evaluation in Isolation (Retrieval) ● Evaluate quality of retrieved chunks given user query ● Create dataset ○ Input: query ○ Output: the “ground-truth” documents relevant to the query ● Run retriever over dataset ● Measure ranking metrics ○ Success rate / hit-rate ○ MRR ○ Hit-rate
  • 14. H2O.ai Confidential Evaluation E2E ● Evaluation of final generated response given input ● Create Dataset ○ Input: query ○ [Optional] Output: the “ground-truth” answer ● Run through full RAG pipeline ● Collect evaluation metrics: ○ If no labels: label-free evals ○ If labels: with-label evals
  • 16. H2O.ai Confidential From Simple to Advanced Less Expressive Easier to Implement Lower Latency/Cost More Expressive Harder to Implement Higher Latency/Cost Table Stakes Better Parsers Chunk Sizes Prompt Engineering Customizing Models 🛠️ Advanced Retrieval Metadata Filtering Recursive Retrieval Embedded Tables Small-to-big Retrieval 🔎 Agentic Behavior Routing Query Planning Multi-document Agents ️ Fine-tuning Embedding fine-tuning LLM fine-tuning ⚙️
  • 17. H2O.ai Confidential Table Stakes: Chunk Sizes Tuning your chunk size can have outsized impacts on performance Not obvious that more retrieved tokens == higher performance! Note: Reranking (shuffling context order) isn’t always beneficial.
  • 18. H2O.ai Confidential Table Stakes: Prompt Engineering RAG uses core Question-Answering (QA) prompt templates Ways you can customize: • Adding few-shot examples • Modifying template text • Adding emotions
  • 19. H2O.ai Confidential Table Stakes: Customizing LLMs Task performance on easy- to-hard tasks (RAG, agents) varies wildly among LLMs
  • 20. H2O.ai Confidential Table Stakes: Customizing Embeddings Your embedding model + reranker affects retrieval quality Source: https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83
  • 21. H2O.ai Confidential Advanced Retrieval: Small-to-Big Intuition: Embedding a big text chunk feels suboptimal. Solution: Embed text at the sentence-level - then expand that window during LLM synthesis
  • 22. H2O.ai Confidential Advanced Retrieval: Small-to-Big Sentence Window Retrieval (k=2) Naive Retrieval (k=5) Only one out of the 5 chunks is relevant - “lost in the middle” problem Leads to more precise retrieval. Avoids “lost in the middle” problems.
  • 23. H2O.ai Confidential Advanced Retrieval: Small-to-Big Intuition: Embedding a big text chunk feels suboptimal. Solution: Embed a smaller reference to the parent chunk. Use parent chunk for synthesis Examples: Smaller chunks, summaries, metadata
  • 24. H2O.ai Confidential Data Agents - LLM-powered knowledge workers Email Read latest emails Knowledge Base Retrieve context Analysis Agent Analyze file Slack Send update Data Agent
  • 25. H2O.ai Confidential Data Agents - Core Components Agent Reasoning Loop ● ReAct Agent (any LLM) ● OpenAI Agent (only OAI) Tools via LlamaHub ● Code interpreter ● Slack ● Notion ● Zapier ● … (15+ tools, ~100 loaders)
  • 26. H2O.ai Confidential Agentic Behavior: Multi-Document Agents Intuition: There’s certain questions that “top-k” RAG can’t answer. Solution: Multi-Document Agents ● Fact-based QA and Summarization over any subsets of documents ● Chain-of-thought and query planning.
  • 27. H2O.ai Confidential Fine-Tuning: Embeddings Credits: Jo Bergum, vespa.ai Intuition: Embedding Representations are not optimized over your dataset Solution: Generate a synthetic query dataset from raw text chunks using LLMs Use this synthetic dataset to finetune an embedding model.
  • 28. H2O.ai Confidential Fine-Tuning: LLMs Intuition: Weaker LLMs are not bad at response synthesis, reasoning, structured outputs, etc. Solution: Generate a synthetic dataset from raw chunks (e.g. using GPT-4). Help fix all of the above!