SlideShare a Scribd company logo
1 | © Copyright 11/17/23 Zilliz
1 | © Copyright 11/17/23 Zilliz
1 | © Copyright 11/17/23 Zilliz
1 | © Copyright 11/17/23 Zilliz
Speaker
Christy Bergman
Developer Advocate, Zilliz
christy.bergman@zilliz.com
https://www.linkedin.com/in/christybergman/
https://github.com/milvus-io/milvus
discord: https://discord.gg/FjCMmaJng6
2 | © Copyright 11/17/23 Zilliz
2 | © Copyright 11/17/23 Zilliz
Image source: https://thedataquarry.com/posts/vector-db-1/
3 | © Copyright 11/17/23 Zilliz
3 | © Copyright 11/17/23 Zilliz
27K+
GitHub
Stars
25M+
Downloads
250+
Contributors
2,600
+
Forks
Milvus is an open-source vector database for GenAI projects. Pip-install on your
laptop, plug into popular AI dev tools, and push to production with a single line of
code.
Easy Setup
Pip-install to start
coding in a notebook
within seconds.
Reusable Code
Write once, and
deploy with one line
of code into the
production
environment
Integration
Plug into OpenAI,
Langchain,
LlmaIndex, and
many more
Feature-rich
Dense & sparse
embeddings,
filtering, reranking
and beyond
4 | © Copyright 11/17/23 Zilliz
4 | © Copyright 11/17/23 Zilliz
Zilliz Cloud is a fully-managed vector
database built atop of OSS Milvus
Open Source
Flexible & Secure Deployment
Enterprise features
for production-ready
Cardinal Search Engine &
Use Case Optimized Compute
Milvus completely
re-engineered to
be optimized
Pipelines Connectors Model Library
A streamlined
unstructured data
platform
Stable Milvus
versions are
continuously
deployed to Zilliz
Cloud
5 | © Copyright 11/17/23 Zilliz
5 | © Copyright 11/17/23 Zilliz
5 | © Copyright 11/17/23 Zilliz
5 | © Copyright 11/17/23 Zilliz
Milvus
Open Source Self-Managed
Milvus Discord
Join our community
github.com/milvus-io/milvus
Getting Started with Vector Databases
milvus.io/discord
6 | © Copyright 11/17/23 Zilliz
6 | © Copyright 11/17/23 Zilliz
AGENDA
01 AI Hallucinations and RAG
03
04 RAG Evaluation Methods
02 4 Challenges
Demo RAG
05 Demo Eval
7 | © Copyright 11/17/23 Zilliz
7 | © Copyright 11/17/23 Zilliz
01
AI Hallucinations
and RAG
Example AI Hallucination
gemini
wikipedia
Example AI Hallucination
gemini
wikipedia
hallucinated
answer
Why do models hallucinate?
• The reason LLMs
hallucinate is because
…
• They are trained on
sequences of words
(tokens)
Sample Data
The hamster cabinet …
!!@#%# …
Monkey eats shark …
trees in the moons…
Vector
Database
Where do Vectors Come From?
Unstructured Data
Embeddings here
Pre-trained Deep
Learning Models
Vectors
Where do Vectors Come From?
Unstructured Data Vectors
Where do Vectors Come From?
Unstructured Data Vectors
Embedding
model
Generator
Model
or LLM
Semantic Similarity
Image from Sutor et al
Woman = [0.3, 0.4]
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Woman = [0.3, 0.4]
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Man = [0.5, 0.2]
Queen - Woman + Man = King
Queen = [0.3, 0.9]
- Woman = [0.3, 0.4]
[0.0, 0.5]
+ Man = [0.5, 0.2]
King = [0.5, 0.7]
Man = [0.5, 0.2]
15 | © Copyright 11/17/23 Zilliz
15 | © Copyright 11/17/23 Zilliz
Retrieval Augmented Generation (RAG)
Your Data
Embedding Model
Vector Database
Question
Question + Context
Search
Gen AI Model
Reliable Answers
What is the default
AUTOINDEX distance
metric in Milvus
Client?
The default
AUTOINDEX distance
metric in Milvus
Client is L2.
16 | © Copyright 11/17/23 Zilliz
16 | © Copyright 11/17/23 Zilliz
02
3 Challenges and
Lessons Learned
17 | © Copyright 11/17/23 Zilliz
17 | © Copyright 11/17/23 Zilliz
Pain Point #1: Choosing an Embedding Model
https://huggingface.co/spaces/mteb/leaderboard
18 | © Copyright 11/17/23 Zilliz
18 | © Copyright 11/17/23 Zilliz
Pain Point #1: Choosing an Embedding Model
Creator Model Embedding
Dim
Context
Length
Use Case
Tasks
Open
Source
MTEB
Score
OpenAI text-embedding-
3-small
512-1536 8K Real-time
Multilingual text
chatbots
No 62 (1536)
62 (512)
OpenAI text-embedding-
3-large
256-3072 8K Real-time
Multilingual text
chatbots
No 65 (3072)
62 (256)
Matryoshka Representation Learning:
https://arxiv.org/pdf/2205.13147v4.pdf
19 | © Copyright 11/17/23 Zilliz
19 | © Copyright 11/17/23 Zilliz
Pain Point #2: Choosing an Index
https://milvus.io/docs/index.md
20 | © Copyright 11/17/23 Zilliz
20 | © Copyright 11/17/23 Zilliz
Pain Point #2: Choosing an Index
● In-memory
○ Floating point dense
■ Flat - The FLAT index is an exhaustive, brute-force approach that compares the query vector
against every single vector in the dataset to find the nearest neighbors. Suitable for small
datasets where perfect accuracy is required, and search latency is not of concern.
■ IVF_Flat - The IVF_FLAT (Inverted File FLAT) index is a quantization-based index that
divides the vector space into clusters. During indexing, vectors are assigned to the nearest
cluster centroid, and during search, only the vectors within the closest clusters to the query
vector are compared.
■ HNSW - HNSW organizes vectors in a hierarchical, multi-layered graph, so search
complexity is logarithmic. The basic idea is to separate nearest neighbours into layers in the
graph where the top layer is the sparsest. The lowest layer forms the complete graph. Search is
performed from top to bottom.
○ Floating point sparse - SPLADE, BGE-M3
○ Binary
● On-disk - diskANN when your data is too large to fit in memory
● Hardware-optimized: GPU CAGRA, ARM,
21 | © Copyright 11/17/23 Zilliz
21 | © Copyright 11/17/23 Zilliz
Pain Point #2: Choosing an Index
IVF-Flat
HNSW
https://arxiv.org/abs/160
3.09320
22 | © Copyright 11/17/23 Zilliz
22 | © Copyright 11/17/23 Zilliz
Conversation
Data
Documentation
Data
Lecture or Q/A
Data
Pain Point #3: Chunking
23 | © Copyright 11/17/23 Zilliz
23 | © Copyright 11/17/23 Zilliz
Conversation
Data
Documentation
Data
Question Answer
Data
add
conversation
memory
use Q&A pair
formatting
Pain Point #3: Chunking
24 | © Copyright 11/17/23 Zilliz
24 | © Copyright 11/17/23 Zilliz
Pain Point #3: Chunks need more context
Tesla Roadster
2018
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tem
2023
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tem
Chunk #1
Chunk #2
Naive Chunks
25 | © Copyright 11/17/23 Zilliz
25 | © Copyright 11/17/23 Zilliz
Tesla Roadster
2018
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tem
2023
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tem
Tesla Roadster 2018
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tem
Tesla Roadster 2023
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tem
HTMLHeaderTextSplitter
ParentDocumentRetriever
Title 2-levels above
Title 1-level above
Naive Chunks Better Chunks
HierarchicalNodeParser
AutoMergingRetriever
Pain Point #3: Chunks need more context
26 | © Copyright 11/17/23 Zilliz
26 | © Copyright 11/17/23 Zilliz
Example
27 | © Copyright 11/17/23 Zilliz
27 | © Copyright 11/17/23 Zilliz
Example
28 | © Copyright 11/17/23 Zilliz
28 | © Copyright 11/17/23 Zilliz
Pain Point #4: Keyword or Semantic Search?
��
Good for:
● Exact product name
● Jargon words
Examples:
● Product name =
“2022 RF GT 6MT”
Good for:
● Similar meaning but
maybe not exact
Examples:
● Similar image search
● Related wiki articles
29 | © Copyright 11/17/23 Zilliz
29 | © Copyright 11/17/23 Zilliz
Pain Point #4: Keyword or Semantic Search?
Dense Vector
Sparse Vector
TF-IDF
BM25
SPLADE
Lucene WAND pruning
BGE-M3
Top10 Top5
Final top_k
Prompt & Question
Improved context
Best of both worlds!
● Reranked Keyword AND Semantic top_k
● Put reranked into the Prompt Context
Keyword
Search
Semantic
Search
Linear comb.
Cross-encoder
Neural reranker
30 | © Copyright 11/17/23 Zilliz
30 | © Copyright 11/17/23 Zilliz
Rerankers - when are they computed?
- Straight up Cosine similarity is called no interaction. This is dense embeddings “semantic
search”.
- BERT was an Early Interaction model meaning relationship between question and docs are
pre-computed as part of Embedding model, offline.
- Cross-encoders are ML-model Late Interaction, calculated at query time. Too
computation-heavy to run real-time except for small top_k to reduce to smaller top_2.
Cross-encoder reranking (adds classifier to Q, A pairs).
- ColBERT v2 is Neural-model Late Interaction calculated offline, before the user asks
their question! ~2% increased accuracy, but requires storing extra embeddings.
- Cohere’s rerank-3, claims ~26% improvement over sparse only; 6% over dense
- Jina.ai Reranker, claims ~20% improvement over sparse only
31 | © Copyright 11/17/23 Zilliz
31 | © Copyright 11/17/23 Zilliz
BERT vs ColBert
BERT: SPLADE, BGE-M3
Query Top_k candidates
Final
top_k
https://arxiv.org/pdf/2112.01488.pdf
32 | © Copyright 11/17/23 Zilliz
32 | © Copyright 11/17/23 Zilliz
Colbert v2 Reranker
https://arxiv.org/pdf/2112.01488.pdf
33 | © Copyright 11/17/23 Zilliz
33 | © Copyright 11/17/23 Zilliz
Slide from Tengyu Ma, April 2024
talk at Unstructured Data
(+add Milvus metadata filtering)
Metadata
filtering (hash)
34 | © Copyright 11/17/23 Zilliz
34 | © Copyright 11/17/23 Zilliz
BGE M3-Embedding
● “Multi-vec” - Multi-vector retrieval, uses
fine-grained interactions between query
and passage’s embeddings to compute
the relevance score. Re-rank the
top-200 Dense candidates, for efficient
processing.
● “Dense+Sparse” - Retrieve the top-1000
candidates with dense and sparse
method; then re-rank using the sum of
two scores.
● “All” - Re-rank based on the sum of all
three scores.
…
Multi-lingual retrieval performance on the MIRACL dev set (measured by nDCG@10).
https://arxiv.org/pdf/2402.03216
35 | © Copyright 11/17/23 Zilliz
35 | © Copyright 11/17/23 Zilliz
https://chat.lmsys.org/?leaderboard
chart by @maximelabonne
36 | © Copyright 11/17/23 Zilliz
36 | © Copyright 11/17/23 Zilliz
37 | © Copyright 11/17/23 Zilliz
37 | © Copyright 11/17/23 Zilliz
Mixtral 8x22B-Instruct-v0.1 with Anyscale Endpoints
https://console.anyscale.com/v2/playground
38 | © Copyright 11/17/23 Zilliz
38 | © Copyright 11/17/23 Zilliz
Question: What do the parameters for HNSW mean?
Prompt
GPT-3.5-turbo
Anyscale endpoints
Mixtral-8x22B-Instruct-v0.1
39 | © Copyright 11/17/23 Zilliz
39 | © Copyright 11/17/23 Zilliz
2023 Lost-in-the-middle
https://arxiv.org/pdf/2307.03172
2024 Needle-in-a-haystack experiments
https://github.com/gkamradt/LLMTest_NeedleInAHaystack
Is RAG dead?
40 | © Copyright 11/17/23 Zilliz
40 | © Copyright 11/17/23 Zilliz
Is RAG dead?
Needle in haystack experiments
Slide from Lance Martin, Langchain
https://blog.langchain.dev/multi-nee
dle-in-a-haystack/
41 | © Copyright 11/17/23 Zilliz
41 | © Copyright 11/17/23 Zilliz
03 Demo Custom RAG
42 | © Copyright 11/17/23 Zilliz
42 | © Copyright 11/17/23 Zilliz
04
RAG Evaluation
Methods
Where do Vectors Come From?
Unstructured Data Vectors
Where do Vectors Come From?
Unstructured Data Vectors
Embedding
model
Generator
Model
or LLM
45 | © Copyright 11/17/23 Zilliz
45 | © Copyright 11/17/23 Zilliz
Retrieval Augmented Generation (RAG)
Your Data
Embedding Model
Vector Database
Question
Question + Context
Search
Gen AI Model
Reliable Answers
What is the default
AUTOINDEX distance
metric in Milvus?
The default
AUTOINDEX distance
metric in Milvus is L2.
46 | © Copyright 11/17/23 Zilliz
46 | © Copyright 11/17/23 Zilliz
Model Evals vs Production System Evals
Your RAG system
Arena Elo score
47 | © Copyright 11/17/23 Zilliz
47 | © Copyright 11/17/23 Zilliz
RAG Evaluation Methods
https://arxiv.org/pdf/2306.05685.pdf
GPT-4 favors itself with a 10% higher
win rate; Claude-v1 favors itself with a
25% higher win rate
Open weight Prometheus-eval aligns
with human judgments up to 85% as
of May 2024.
48 | © Copyright 11/17/23 Zilliz
48 | © Copyright 11/17/23 Zilliz
Known Problems with LLM-as-Judge
https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG
GPT-4 is not a good
judge of
comprehensiveness
GPT-4
Matches
Human
judgements on
Correctness &
Readability
49 | © Copyright 11/17/23 Zilliz
49 | © Copyright 11/17/23 Zilliz
Known Problems with LLM-as-Judge
https://arxiv.org/pdf/2305.17926
AI scores
max/min higher
Humans
score
medians
higher
50 | © Copyright 11/17/23 Zilliz
50 | © Copyright 11/17/23 Zilliz
RAG Evaluation Methods
https://github.com/explodinggradients/ragas
faithfulness
context_precision
context_recall
Query
Context
answer_relevancy
Ground Truth
Answer
answer_correctness
answer_similarity
Response
51 | © Copyright 11/17/23 Zilliz
51 | © Copyright 11/17/23 Zilliz
03 Demo RAG Eval
52 | © Copyright 11/17/23 Zilliz
52 | © Copyright 11/17/23 Zilliz
T H A N K Y O U
󰚥 We need your stars!
https://github.com/milvus-io/milvus
💬Join our discord: https://discord.gg/FjCMmaJng6
Open Source Zilliz Architecture

More Related Content

Similar to Introduction to Open Source RAG and RAG Evaluation

Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
Yves Raimond
 
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
Neo4j
 
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Neo4j
 
Mysql NDB Cluster's Asynchronous Parallel Design for High Performance
Mysql NDB Cluster's Asynchronous Parallel Design for High PerformanceMysql NDB Cluster's Asynchronous Parallel Design for High Performance
Mysql NDB Cluster's Asynchronous Parallel Design for High Performance
Bernd Ocklin
 
[Heap con19] designing data intensive applications in serverless architecture
[Heap con19] designing data intensive applications in serverless architecture[Heap con19] designing data intensive applications in serverless architecture
[Heap con19] designing data intensive applications in serverless architecture
Nikolay Matvienko
 
Rails israel 2013
Rails israel 2013Rails israel 2013
Rails israel 2013
Reuven Lerner
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, Wix
Codemotion Tel Aviv
 
MySQL Document Store - when SQL & NoSQL live together... in peace!
MySQL Document Store - when SQL & NoSQL live together... in peace!MySQL Document Store - when SQL & NoSQL live together... in peace!
MySQL Document Store - when SQL & NoSQL live together... in peace!
Frederic Descamps
 
IRJET- Efficient Geometric Range Search on RTREE Occupying Encrypted Spatial ...
IRJET- Efficient Geometric Range Search on RTREE Occupying Encrypted Spatial ...IRJET- Efficient Geometric Range Search on RTREE Occupying Encrypted Spatial ...
IRJET- Efficient Geometric Range Search on RTREE Occupying Encrypted Spatial ...
IRJET Journal
 
Designing scalable application: from umbrella project to distributed system -...
Designing scalable application: from umbrella project to distributed system -...Designing scalable application: from umbrella project to distributed system -...
Designing scalable application: from umbrella project to distributed system -...
Elixir Club
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
MLconf
 
Managing your Black Friday Logs
Managing your Black Friday LogsManaging your Black Friday Logs
Managing your Black Friday Logs
J On The Beach
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
confluent
 
GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016
Joshua Bae
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Master the RETE algorithm
Master the RETE algorithmMaster the RETE algorithm
Master the RETE algorithm
Masahiko Umeno
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
MongoDB
 
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
Zilliz
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
Rob Skillington
 

Similar to Introduction to Open Source RAG and RAG Evaluation (20)

Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
 
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
 
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
 
Mysql NDB Cluster's Asynchronous Parallel Design for High Performance
Mysql NDB Cluster's Asynchronous Parallel Design for High PerformanceMysql NDB Cluster's Asynchronous Parallel Design for High Performance
Mysql NDB Cluster's Asynchronous Parallel Design for High Performance
 
[Heap con19] designing data intensive applications in serverless architecture
[Heap con19] designing data intensive applications in serverless architecture[Heap con19] designing data intensive applications in serverless architecture
[Heap con19] designing data intensive applications in serverless architecture
 
Rails israel 2013
Rails israel 2013Rails israel 2013
Rails israel 2013
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, Wix
 
MySQL Document Store - when SQL & NoSQL live together... in peace!
MySQL Document Store - when SQL & NoSQL live together... in peace!MySQL Document Store - when SQL & NoSQL live together... in peace!
MySQL Document Store - when SQL & NoSQL live together... in peace!
 
IRJET- Efficient Geometric Range Search on RTREE Occupying Encrypted Spatial ...
IRJET- Efficient Geometric Range Search on RTREE Occupying Encrypted Spatial ...IRJET- Efficient Geometric Range Search on RTREE Occupying Encrypted Spatial ...
IRJET- Efficient Geometric Range Search on RTREE Occupying Encrypted Spatial ...
 
Designing scalable application: from umbrella project to distributed system -...
Designing scalable application: from umbrella project to distributed system -...Designing scalable application: from umbrella project to distributed system -...
Designing scalable application: from umbrella project to distributed system -...
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Managing your Black Friday Logs
Managing your Black Friday LogsManaging your Black Friday Logs
Managing your Black Friday Logs
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
 
GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Master the RETE algorithm
Master the RETE algorithmMaster the RETE algorithm
Master the RETE algorithm
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
 
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 

More from Zilliz

Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
MemGPT: Introduction to Memory Augmented Chat
MemGPT: Introduction to Memory Augmented ChatMemGPT: Introduction to Memory Augmented Chat
MemGPT: Introduction to Memory Augmented Chat
Zilliz
 
Copilot Workspace: What it is, how it works, why it matters
Copilot Workspace: What it is, how it works, why it mattersCopilot Workspace: What it is, how it works, why it matters
Copilot Workspace: What it is, how it works, why it matters
Zilliz
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AIKnowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
Zilliz
 
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
Zilliz
 
Advanced Retrieval Augmented Generation Techniques
Advanced Retrieval Augmented Generation TechniquesAdvanced Retrieval Augmented Generation Techniques
Advanced Retrieval Augmented Generation Techniques
Zilliz
 
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Zilliz
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Zilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Zilliz
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Zilliz
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
Zilliz
 
Zilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in MLZilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in ML
Zilliz
 
Integrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with FloomIntegrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with Floom
Zilliz
 

More from Zilliz (20)

Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
MemGPT: Introduction to Memory Augmented Chat
MemGPT: Introduction to Memory Augmented ChatMemGPT: Introduction to Memory Augmented Chat
MemGPT: Introduction to Memory Augmented Chat
 
Copilot Workspace: What it is, how it works, why it matters
Copilot Workspace: What it is, how it works, why it mattersCopilot Workspace: What it is, how it works, why it matters
Copilot Workspace: What it is, how it works, why it matters
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AIKnowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
 
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
 
Advanced Retrieval Augmented Generation Techniques
Advanced Retrieval Augmented Generation TechniquesAdvanced Retrieval Augmented Generation Techniques
Advanced Retrieval Augmented Generation Techniques
 
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Zilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in MLZilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in ML
 
Integrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with FloomIntegrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with Floom
 

Recently uploaded

Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Ukraine
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
leebarnesutopia
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
Sunil Jagani
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 

Recently uploaded (20)

Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 

Introduction to Open Source RAG and RAG Evaluation

  • 1. 1 | © Copyright 11/17/23 Zilliz 1 | © Copyright 11/17/23 Zilliz 1 | © Copyright 11/17/23 Zilliz 1 | © Copyright 11/17/23 Zilliz Speaker Christy Bergman Developer Advocate, Zilliz christy.bergman@zilliz.com https://www.linkedin.com/in/christybergman/ https://github.com/milvus-io/milvus discord: https://discord.gg/FjCMmaJng6
  • 2. 2 | © Copyright 11/17/23 Zilliz 2 | © Copyright 11/17/23 Zilliz Image source: https://thedataquarry.com/posts/vector-db-1/
  • 3. 3 | © Copyright 11/17/23 Zilliz 3 | © Copyright 11/17/23 Zilliz 27K+ GitHub Stars 25M+ Downloads 250+ Contributors 2,600 + Forks Milvus is an open-source vector database for GenAI projects. Pip-install on your laptop, plug into popular AI dev tools, and push to production with a single line of code. Easy Setup Pip-install to start coding in a notebook within seconds. Reusable Code Write once, and deploy with one line of code into the production environment Integration Plug into OpenAI, Langchain, LlmaIndex, and many more Feature-rich Dense & sparse embeddings, filtering, reranking and beyond
  • 4. 4 | © Copyright 11/17/23 Zilliz 4 | © Copyright 11/17/23 Zilliz Zilliz Cloud is a fully-managed vector database built atop of OSS Milvus Open Source Flexible & Secure Deployment Enterprise features for production-ready Cardinal Search Engine & Use Case Optimized Compute Milvus completely re-engineered to be optimized Pipelines Connectors Model Library A streamlined unstructured data platform Stable Milvus versions are continuously deployed to Zilliz Cloud
  • 5. 5 | © Copyright 11/17/23 Zilliz 5 | © Copyright 11/17/23 Zilliz 5 | © Copyright 11/17/23 Zilliz 5 | © Copyright 11/17/23 Zilliz Milvus Open Source Self-Managed Milvus Discord Join our community github.com/milvus-io/milvus Getting Started with Vector Databases milvus.io/discord
  • 6. 6 | © Copyright 11/17/23 Zilliz 6 | © Copyright 11/17/23 Zilliz AGENDA 01 AI Hallucinations and RAG 03 04 RAG Evaluation Methods 02 4 Challenges Demo RAG 05 Demo Eval
  • 7. 7 | © Copyright 11/17/23 Zilliz 7 | © Copyright 11/17/23 Zilliz 01 AI Hallucinations and RAG
  • 10. Why do models hallucinate? • The reason LLMs hallucinate is because … • They are trained on sequences of words (tokens) Sample Data The hamster cabinet … !!@#%# … Monkey eats shark … trees in the moons…
  • 11. Vector Database Where do Vectors Come From? Unstructured Data Embeddings here Pre-trained Deep Learning Models Vectors
  • 12. Where do Vectors Come From? Unstructured Data Vectors
  • 13. Where do Vectors Come From? Unstructured Data Vectors Embedding model Generator Model or LLM
  • 14. Semantic Similarity Image from Sutor et al Woman = [0.3, 0.4] Queen = [0.3, 0.9] King = [0.5, 0.7] Woman = [0.3, 0.4] Queen = [0.3, 0.9] King = [0.5, 0.7] Man = [0.5, 0.2] Queen - Woman + Man = King Queen = [0.3, 0.9] - Woman = [0.3, 0.4] [0.0, 0.5] + Man = [0.5, 0.2] King = [0.5, 0.7] Man = [0.5, 0.2]
  • 15. 15 | © Copyright 11/17/23 Zilliz 15 | © Copyright 11/17/23 Zilliz Retrieval Augmented Generation (RAG) Your Data Embedding Model Vector Database Question Question + Context Search Gen AI Model Reliable Answers What is the default AUTOINDEX distance metric in Milvus Client? The default AUTOINDEX distance metric in Milvus Client is L2.
  • 16. 16 | © Copyright 11/17/23 Zilliz 16 | © Copyright 11/17/23 Zilliz 02 3 Challenges and Lessons Learned
  • 17. 17 | © Copyright 11/17/23 Zilliz 17 | © Copyright 11/17/23 Zilliz Pain Point #1: Choosing an Embedding Model https://huggingface.co/spaces/mteb/leaderboard
  • 18. 18 | © Copyright 11/17/23 Zilliz 18 | © Copyright 11/17/23 Zilliz Pain Point #1: Choosing an Embedding Model Creator Model Embedding Dim Context Length Use Case Tasks Open Source MTEB Score OpenAI text-embedding- 3-small 512-1536 8K Real-time Multilingual text chatbots No 62 (1536) 62 (512) OpenAI text-embedding- 3-large 256-3072 8K Real-time Multilingual text chatbots No 65 (3072) 62 (256) Matryoshka Representation Learning: https://arxiv.org/pdf/2205.13147v4.pdf
  • 19. 19 | © Copyright 11/17/23 Zilliz 19 | © Copyright 11/17/23 Zilliz Pain Point #2: Choosing an Index https://milvus.io/docs/index.md
  • 20. 20 | © Copyright 11/17/23 Zilliz 20 | © Copyright 11/17/23 Zilliz Pain Point #2: Choosing an Index ● In-memory ○ Floating point dense ■ Flat - The FLAT index is an exhaustive, brute-force approach that compares the query vector against every single vector in the dataset to find the nearest neighbors. Suitable for small datasets where perfect accuracy is required, and search latency is not of concern. ■ IVF_Flat - The IVF_FLAT (Inverted File FLAT) index is a quantization-based index that divides the vector space into clusters. During indexing, vectors are assigned to the nearest cluster centroid, and during search, only the vectors within the closest clusters to the query vector are compared. ■ HNSW - HNSW organizes vectors in a hierarchical, multi-layered graph, so search complexity is logarithmic. The basic idea is to separate nearest neighbours into layers in the graph where the top layer is the sparsest. The lowest layer forms the complete graph. Search is performed from top to bottom. ○ Floating point sparse - SPLADE, BGE-M3 ○ Binary ● On-disk - diskANN when your data is too large to fit in memory ● Hardware-optimized: GPU CAGRA, ARM,
  • 21. 21 | © Copyright 11/17/23 Zilliz 21 | © Copyright 11/17/23 Zilliz Pain Point #2: Choosing an Index IVF-Flat HNSW https://arxiv.org/abs/160 3.09320
  • 22. 22 | © Copyright 11/17/23 Zilliz 22 | © Copyright 11/17/23 Zilliz Conversation Data Documentation Data Lecture or Q/A Data Pain Point #3: Chunking
  • 23. 23 | © Copyright 11/17/23 Zilliz 23 | © Copyright 11/17/23 Zilliz Conversation Data Documentation Data Question Answer Data add conversation memory use Q&A pair formatting Pain Point #3: Chunking
  • 24. 24 | © Copyright 11/17/23 Zilliz 24 | © Copyright 11/17/23 Zilliz Pain Point #3: Chunks need more context Tesla Roadster 2018 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem 2023 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem Chunk #1 Chunk #2 Naive Chunks
  • 25. 25 | © Copyright 11/17/23 Zilliz 25 | © Copyright 11/17/23 Zilliz Tesla Roadster 2018 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem 2023 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem Tesla Roadster 2018 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem Tesla Roadster 2023 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem HTMLHeaderTextSplitter ParentDocumentRetriever Title 2-levels above Title 1-level above Naive Chunks Better Chunks HierarchicalNodeParser AutoMergingRetriever Pain Point #3: Chunks need more context
  • 26. 26 | © Copyright 11/17/23 Zilliz 26 | © Copyright 11/17/23 Zilliz Example
  • 27. 27 | © Copyright 11/17/23 Zilliz 27 | © Copyright 11/17/23 Zilliz Example
  • 28. 28 | © Copyright 11/17/23 Zilliz 28 | © Copyright 11/17/23 Zilliz Pain Point #4: Keyword or Semantic Search? �� Good for: ● Exact product name ● Jargon words Examples: ● Product name = “2022 RF GT 6MT” Good for: ● Similar meaning but maybe not exact Examples: ● Similar image search ● Related wiki articles
  • 29. 29 | © Copyright 11/17/23 Zilliz 29 | © Copyright 11/17/23 Zilliz Pain Point #4: Keyword or Semantic Search? Dense Vector Sparse Vector TF-IDF BM25 SPLADE Lucene WAND pruning BGE-M3 Top10 Top5 Final top_k Prompt & Question Improved context Best of both worlds! ● Reranked Keyword AND Semantic top_k ● Put reranked into the Prompt Context Keyword Search Semantic Search Linear comb. Cross-encoder Neural reranker
  • 30. 30 | © Copyright 11/17/23 Zilliz 30 | © Copyright 11/17/23 Zilliz Rerankers - when are they computed? - Straight up Cosine similarity is called no interaction. This is dense embeddings “semantic search”. - BERT was an Early Interaction model meaning relationship between question and docs are pre-computed as part of Embedding model, offline. - Cross-encoders are ML-model Late Interaction, calculated at query time. Too computation-heavy to run real-time except for small top_k to reduce to smaller top_2. Cross-encoder reranking (adds classifier to Q, A pairs). - ColBERT v2 is Neural-model Late Interaction calculated offline, before the user asks their question! ~2% increased accuracy, but requires storing extra embeddings. - Cohere’s rerank-3, claims ~26% improvement over sparse only; 6% over dense - Jina.ai Reranker, claims ~20% improvement over sparse only
  • 31. 31 | © Copyright 11/17/23 Zilliz 31 | © Copyright 11/17/23 Zilliz BERT vs ColBert BERT: SPLADE, BGE-M3 Query Top_k candidates Final top_k https://arxiv.org/pdf/2112.01488.pdf
  • 32. 32 | © Copyright 11/17/23 Zilliz 32 | © Copyright 11/17/23 Zilliz Colbert v2 Reranker https://arxiv.org/pdf/2112.01488.pdf
  • 33. 33 | © Copyright 11/17/23 Zilliz 33 | © Copyright 11/17/23 Zilliz Slide from Tengyu Ma, April 2024 talk at Unstructured Data (+add Milvus metadata filtering) Metadata filtering (hash)
  • 34. 34 | © Copyright 11/17/23 Zilliz 34 | © Copyright 11/17/23 Zilliz BGE M3-Embedding ● “Multi-vec” - Multi-vector retrieval, uses fine-grained interactions between query and passage’s embeddings to compute the relevance score. Re-rank the top-200 Dense candidates, for efficient processing. ● “Dense+Sparse” - Retrieve the top-1000 candidates with dense and sparse method; then re-rank using the sum of two scores. ● “All” - Re-rank based on the sum of all three scores. … Multi-lingual retrieval performance on the MIRACL dev set (measured by nDCG@10). https://arxiv.org/pdf/2402.03216
  • 35. 35 | © Copyright 11/17/23 Zilliz 35 | © Copyright 11/17/23 Zilliz https://chat.lmsys.org/?leaderboard chart by @maximelabonne
  • 36. 36 | © Copyright 11/17/23 Zilliz 36 | © Copyright 11/17/23 Zilliz
  • 37. 37 | © Copyright 11/17/23 Zilliz 37 | © Copyright 11/17/23 Zilliz Mixtral 8x22B-Instruct-v0.1 with Anyscale Endpoints https://console.anyscale.com/v2/playground
  • 38. 38 | © Copyright 11/17/23 Zilliz 38 | © Copyright 11/17/23 Zilliz Question: What do the parameters for HNSW mean? Prompt GPT-3.5-turbo Anyscale endpoints Mixtral-8x22B-Instruct-v0.1
  • 39. 39 | © Copyright 11/17/23 Zilliz 39 | © Copyright 11/17/23 Zilliz 2023 Lost-in-the-middle https://arxiv.org/pdf/2307.03172 2024 Needle-in-a-haystack experiments https://github.com/gkamradt/LLMTest_NeedleInAHaystack Is RAG dead?
  • 40. 40 | © Copyright 11/17/23 Zilliz 40 | © Copyright 11/17/23 Zilliz Is RAG dead? Needle in haystack experiments Slide from Lance Martin, Langchain https://blog.langchain.dev/multi-nee dle-in-a-haystack/
  • 41. 41 | © Copyright 11/17/23 Zilliz 41 | © Copyright 11/17/23 Zilliz 03 Demo Custom RAG
  • 42. 42 | © Copyright 11/17/23 Zilliz 42 | © Copyright 11/17/23 Zilliz 04 RAG Evaluation Methods
  • 43. Where do Vectors Come From? Unstructured Data Vectors
  • 44. Where do Vectors Come From? Unstructured Data Vectors Embedding model Generator Model or LLM
  • 45. 45 | © Copyright 11/17/23 Zilliz 45 | © Copyright 11/17/23 Zilliz Retrieval Augmented Generation (RAG) Your Data Embedding Model Vector Database Question Question + Context Search Gen AI Model Reliable Answers What is the default AUTOINDEX distance metric in Milvus? The default AUTOINDEX distance metric in Milvus is L2.
  • 46. 46 | © Copyright 11/17/23 Zilliz 46 | © Copyright 11/17/23 Zilliz Model Evals vs Production System Evals Your RAG system Arena Elo score
  • 47. 47 | © Copyright 11/17/23 Zilliz 47 | © Copyright 11/17/23 Zilliz RAG Evaluation Methods https://arxiv.org/pdf/2306.05685.pdf GPT-4 favors itself with a 10% higher win rate; Claude-v1 favors itself with a 25% higher win rate Open weight Prometheus-eval aligns with human judgments up to 85% as of May 2024.
  • 48. 48 | © Copyright 11/17/23 Zilliz 48 | © Copyright 11/17/23 Zilliz Known Problems with LLM-as-Judge https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG GPT-4 is not a good judge of comprehensiveness GPT-4 Matches Human judgements on Correctness & Readability
  • 49. 49 | © Copyright 11/17/23 Zilliz 49 | © Copyright 11/17/23 Zilliz Known Problems with LLM-as-Judge https://arxiv.org/pdf/2305.17926 AI scores max/min higher Humans score medians higher
  • 50. 50 | © Copyright 11/17/23 Zilliz 50 | © Copyright 11/17/23 Zilliz RAG Evaluation Methods https://github.com/explodinggradients/ragas faithfulness context_precision context_recall Query Context answer_relevancy Ground Truth Answer answer_correctness answer_similarity Response
  • 51. 51 | © Copyright 11/17/23 Zilliz 51 | © Copyright 11/17/23 Zilliz 03 Demo RAG Eval
  • 52. 52 | © Copyright 11/17/23 Zilliz 52 | © Copyright 11/17/23 Zilliz T H A N K Y O U 󰚥 We need your stars! https://github.com/milvus-io/milvus 💬Join our discord: https://discord.gg/FjCMmaJng6
  • 53. Open Source Zilliz Architecture