SlideShare a Scribd company logo
1 | © Copyright 11/17/23 Zilliz
1 | © Copyright 11/17/23 Zilliz
1 | © Copyright 11/17/23 Zilliz
1 | © Copyright 11/17/23 Zilliz
Speaker
Christy Bergman
Developer Advocate, Zilliz
christy.bergman@zilliz.com
https://www.linkedin.com/in/christybergman/
https://github.com/milvus-io/milvus
discord: https://discord.gg/FjCMmaJng6
2 | © Copyright 11/17/23 Zilliz
2 | © Copyright 11/17/23 Zilliz
Image source: https://thedataquarry.com/posts/vector-db-1/
3 | © Copyright 11/17/23 Zilliz
3 | © Copyright 11/17/23 Zilliz
27K+
GitHub
Stars
25M+
Downloads
250+
Contributors
2,600
+
Forks
Milvus is an open-source vector database for GenAI projects. Pip-install on your
laptop, plug into popular AI dev tools, and push to production with a single line of
code.
Easy Setup
Pip-install to start
coding in a notebook
within seconds.
Reusable Code
Write once, and
deploy with one line
of code into the
production
environment
Integration
Plug into OpenAI,
Langchain,
LlmaIndex, and
many more
Feature-rich
Dense & sparse
embeddings,
filtering, reranking
and beyond
4 | © Copyright 11/17/23 Zilliz
4 | © Copyright 11/17/23 Zilliz
Zilliz Cloud is a fully-managed vector
database built atop of OSS Milvus
Open Source
Flexible & Secure Deployment
Enterprise features
for production-ready
Cardinal Search Engine &
Use Case Optimized Compute
Milvus completely
re-engineered to
be optimized
Pipelines Connectors Model Library
A streamlined
unstructured data
platform
Stable Milvus
versions are
continuously
deployed to Zilliz
Cloud
5 | © Copyright 11/17/23 Zilliz
5 | © Copyright 11/17/23 Zilliz
5 | © Copyright 11/17/23 Zilliz
5 | © Copyright 11/17/23 Zilliz
Milvus
Open Source Self-Managed
Milvus Discord
Join our community
github.com/milvus-io/milvus
Getting Started with Vector Databases
milvus.io/discord
6 | © Copyright 11/17/23 Zilliz
6 | © Copyright 11/17/23 Zilliz
AGENDA
01 AI Hallucinations and RAG
03
04 RAG Evaluation Methods
02 4 Challenges
Demo RAG
05 Demo Eval
7 | © Copyright 11/17/23 Zilliz
7 | © Copyright 11/17/23 Zilliz
01
AI Hallucinations
and RAG
Example AI Hallucination
gemini
wikipedia
Example AI Hallucination
gemini
wikipedia
hallucinated
answer
Why do models hallucinate?
• The reason LLMs
hallucinate is because
…
• They are trained on
sequences of words
(tokens)
Sample Data
The hamster cabinet …
!!@#%# …
Monkey eats shark …
trees in the moons…
Vector
Database
Where do Vectors Come From?
Unstructured Data
Embeddings here
Pre-trained Deep
Learning Models
Vectors
Where do Vectors Come From?
Unstructured Data Vectors
Where do Vectors Come From?
Unstructured Data Vectors
Embedding
model
Generator
Model
or LLM
Semantic Similarity
Image from Sutor et al
Woman = [0.3, 0.4]
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Woman = [0.3, 0.4]
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Man = [0.5, 0.2]
Queen - Woman + Man = King
Queen = [0.3, 0.9]
- Woman = [0.3, 0.4]
[0.0, 0.5]
+ Man = [0.5, 0.2]
King = [0.5, 0.7]
Man = [0.5, 0.2]
15 | © Copyright 11/17/23 Zilliz
15 | © Copyright 11/17/23 Zilliz
Retrieval Augmented Generation (RAG)
Your Data
Embedding Model
Vector Database
Question
Question + Context
Search
Gen AI Model
Reliable Answers
What is the default
AUTOINDEX distance
metric in Milvus
Client?
The default
AUTOINDEX distance
metric in Milvus
Client is L2.
16 | © Copyright 11/17/23 Zilliz
16 | © Copyright 11/17/23 Zilliz
02
3 Challenges and
Lessons Learned
17 | © Copyright 11/17/23 Zilliz
17 | © Copyright 11/17/23 Zilliz
Pain Point #1: Choosing an Embedding Model
https://huggingface.co/spaces/mteb/leaderboard
18 | © Copyright 11/17/23 Zilliz
18 | © Copyright 11/17/23 Zilliz
Pain Point #1: Choosing an Embedding Model
Creator Model Embedding
Dim
Context
Length
Use Case
Tasks
Open
Source
MTEB
Score
OpenAI text-embedding-
3-small
512-1536 8K Real-time
Multilingual text
chatbots
No 62 (1536)
62 (512)
OpenAI text-embedding-
3-large
256-3072 8K Real-time
Multilingual text
chatbots
No 65 (3072)
62 (256)
Matryoshka Representation Learning:
https://arxiv.org/pdf/2205.13147v4.pdf
19 | © Copyright 11/17/23 Zilliz
19 | © Copyright 11/17/23 Zilliz
Pain Point #2: Choosing an Index
https://milvus.io/docs/index.md
20 | © Copyright 11/17/23 Zilliz
20 | © Copyright 11/17/23 Zilliz
Pain Point #2: Choosing an Index
● In-memory
○ Floating point dense
■ Flat - The FLAT index is an exhaustive, brute-force approach that compares the query vector
against every single vector in the dataset to find the nearest neighbors. Suitable for small
datasets where perfect accuracy is required, and search latency is not of concern.
■ IVF_Flat - The IVF_FLAT (Inverted File FLAT) index is a quantization-based index that
divides the vector space into clusters. During indexing, vectors are assigned to the nearest
cluster centroid, and during search, only the vectors within the closest clusters to the query
vector are compared.
■ HNSW - HNSW organizes vectors in a hierarchical, multi-layered graph, so search
complexity is logarithmic. The basic idea is to separate nearest neighbours into layers in the
graph where the top layer is the sparsest. The lowest layer forms the complete graph. Search is
performed from top to bottom.
○ Floating point sparse - SPLADE, BGE-M3
○ Binary
● On-disk - diskANN when your data is too large to fit in memory
● Hardware-optimized: GPU CAGRA, ARM,
21 | © Copyright 11/17/23 Zilliz
21 | © Copyright 11/17/23 Zilliz
Pain Point #2: Choosing an Index
IVF-Flat
HNSW
https://arxiv.org/abs/160
3.09320
22 | © Copyright 11/17/23 Zilliz
22 | © Copyright 11/17/23 Zilliz
Conversation
Data
Documentation
Data
Lecture or Q/A
Data
Pain Point #3: Chunking
23 | © Copyright 11/17/23 Zilliz
23 | © Copyright 11/17/23 Zilliz
Conversation
Data
Documentation
Data
Question Answer
Data
add
conversation
memory
use Q&A pair
formatting
Pain Point #3: Chunking
24 | © Copyright 11/17/23 Zilliz
24 | © Copyright 11/17/23 Zilliz
Pain Point #3: Chunks need more context
Tesla Roadster
2018
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tem
2023
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tem
Chunk #1
Chunk #2
Naive Chunks
25 | © Copyright 11/17/23 Zilliz
25 | © Copyright 11/17/23 Zilliz
Tesla Roadster
2018
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tem
2023
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tem
Tesla Roadster 2018
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tem
Tesla Roadster 2023
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tem
HTMLHeaderTextSplitter
ParentDocumentRetriever
Title 2-levels above
Title 1-level above
Naive Chunks Better Chunks
HierarchicalNodeParser
AutoMergingRetriever
Pain Point #3: Chunks need more context
26 | © Copyright 11/17/23 Zilliz
26 | © Copyright 11/17/23 Zilliz
Example
27 | © Copyright 11/17/23 Zilliz
27 | © Copyright 11/17/23 Zilliz
Example
28 | © Copyright 11/17/23 Zilliz
28 | © Copyright 11/17/23 Zilliz
Pain Point #4: Keyword or Semantic Search?
��
Good for:
● Exact product name
● Jargon words
Examples:
● Product name =
“2022 RF GT 6MT”
Good for:
● Similar meaning but
maybe not exact
Examples:
● Similar image search
● Related wiki articles
29 | © Copyright 11/17/23 Zilliz
29 | © Copyright 11/17/23 Zilliz
Pain Point #4: Keyword or Semantic Search?
Dense Vector
Sparse Vector
TF-IDF
BM25
SPLADE
Lucene WAND pruning
BGE-M3
Top10 Top5
Final top_k
Prompt & Question
Improved context
Best of both worlds!
● Reranked Keyword AND Semantic top_k
● Put reranked into the Prompt Context
Keyword
Search
Semantic
Search
Linear comb.
Cross-encoder
Neural reranker
30 | © Copyright 11/17/23 Zilliz
30 | © Copyright 11/17/23 Zilliz
Rerankers - when are they computed?
- Straight up Cosine similarity is called no interaction. This is dense embeddings “semantic
search”.
- BERT was an Early Interaction model meaning relationship between question and docs are
pre-computed as part of Embedding model, offline.
- Cross-encoders are ML-model Late Interaction, calculated at query time. Too
computation-heavy to run real-time except for small top_k to reduce to smaller top_2.
Cross-encoder reranking (adds classifier to Q, A pairs).
- ColBERT v2 is Neural-model Late Interaction calculated offline, before the user asks
their question! ~2% increased accuracy, but requires storing extra embeddings.
- Cohere’s rerank-3, claims ~26% improvement over sparse only; 6% over dense
- Jina.ai Reranker, claims ~20% improvement over sparse only
31 | © Copyright 11/17/23 Zilliz
31 | © Copyright 11/17/23 Zilliz
BERT vs ColBert
BERT: SPLADE, BGE-M3
Query Top_k candidates
Final
top_k
https://arxiv.org/pdf/2112.01488.pdf
32 | © Copyright 11/17/23 Zilliz
32 | © Copyright 11/17/23 Zilliz
Colbert v2 Reranker
https://arxiv.org/pdf/2112.01488.pdf
33 | © Copyright 11/17/23 Zilliz
33 | © Copyright 11/17/23 Zilliz
Slide from Tengyu Ma, April 2024
talk at Unstructured Data
(+add Milvus metadata filtering)
Metadata
filtering (hash)
34 | © Copyright 11/17/23 Zilliz
34 | © Copyright 11/17/23 Zilliz
BGE M3-Embedding
● “Multi-vec” - Multi-vector retrieval, uses
fine-grained interactions between query
and passage’s embeddings to compute
the relevance score. Re-rank the
top-200 Dense candidates, for efficient
processing.
● “Dense+Sparse” - Retrieve the top-1000
candidates with dense and sparse
method; then re-rank using the sum of
two scores.
● “All” - Re-rank based on the sum of all
three scores.
…
Multi-lingual retrieval performance on the MIRACL dev set (measured by nDCG@10).
https://arxiv.org/pdf/2402.03216
35 | © Copyright 11/17/23 Zilliz
35 | © Copyright 11/17/23 Zilliz
https://chat.lmsys.org/?leaderboard
chart by @maximelabonne
36 | © Copyright 11/17/23 Zilliz
36 | © Copyright 11/17/23 Zilliz
37 | © Copyright 11/17/23 Zilliz
37 | © Copyright 11/17/23 Zilliz
Mixtral 8x22B-Instruct-v0.1 with Anyscale Endpoints
https://console.anyscale.com/v2/playground
38 | © Copyright 11/17/23 Zilliz
38 | © Copyright 11/17/23 Zilliz
Question: What do the parameters for HNSW mean?
Prompt
GPT-3.5-turbo
Anyscale endpoints
Mixtral-8x22B-Instruct-v0.1
39 | © Copyright 11/17/23 Zilliz
39 | © Copyright 11/17/23 Zilliz
2023 Lost-in-the-middle
https://arxiv.org/pdf/2307.03172
2024 Needle-in-a-haystack experiments
https://github.com/gkamradt/LLMTest_NeedleInAHaystack
Is RAG dead?
40 | © Copyright 11/17/23 Zilliz
40 | © Copyright 11/17/23 Zilliz
Is RAG dead?
Needle in haystack experiments
Slide from Lance Martin, Langchain
https://blog.langchain.dev/multi-nee
dle-in-a-haystack/
41 | © Copyright 11/17/23 Zilliz
41 | © Copyright 11/17/23 Zilliz
03 Demo Custom RAG
42 | © Copyright 11/17/23 Zilliz
42 | © Copyright 11/17/23 Zilliz
04
RAG Evaluation
Methods
Where do Vectors Come From?
Unstructured Data Vectors
Where do Vectors Come From?
Unstructured Data Vectors
Embedding
model
Generator
Model
or LLM
45 | © Copyright 11/17/23 Zilliz
45 | © Copyright 11/17/23 Zilliz
Retrieval Augmented Generation (RAG)
Your Data
Embedding Model
Vector Database
Question
Question + Context
Search
Gen AI Model
Reliable Answers
What is the default
AUTOINDEX distance
metric in Milvus?
The default
AUTOINDEX distance
metric in Milvus is L2.
46 | © Copyright 11/17/23 Zilliz
46 | © Copyright 11/17/23 Zilliz
Model Evals vs Production System Evals
Your RAG system
Arena Elo score
47 | © Copyright 11/17/23 Zilliz
47 | © Copyright 11/17/23 Zilliz
RAG Evaluation Methods
https://arxiv.org/pdf/2306.05685.pdf
GPT-4 favors itself with a 10% higher
win rate; Claude-v1 favors itself with a
25% higher win rate
Open weight Prometheus-eval aligns
with human judgments up to 85% as
of May 2024.
48 | © Copyright 11/17/23 Zilliz
48 | © Copyright 11/17/23 Zilliz
Known Problems with LLM-as-Judge
https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG
GPT-4 is not a good
judge of
comprehensiveness
GPT-4
Matches
Human
judgements on
Correctness &
Readability
49 | © Copyright 11/17/23 Zilliz
49 | © Copyright 11/17/23 Zilliz
Known Problems with LLM-as-Judge
https://arxiv.org/pdf/2305.17926
AI scores
max/min higher
Humans
score
medians
higher
50 | © Copyright 11/17/23 Zilliz
50 | © Copyright 11/17/23 Zilliz
RAG Evaluation Methods
https://github.com/explodinggradients/ragas
faithfulness
context_precision
context_recall
Query
Context
answer_relevancy
Ground Truth
Answer
answer_correctness
answer_similarity
Response
51 | © Copyright 11/17/23 Zilliz
51 | © Copyright 11/17/23 Zilliz
03 Demo RAG Eval
52 | © Copyright 11/17/23 Zilliz
52 | © Copyright 11/17/23 Zilliz
T H A N K Y O U
󰚥 We need your stars!
https://github.com/milvus-io/milvus
💬Join our discord: https://discord.gg/FjCMmaJng6
Open Source Zilliz Architecture

More Related Content

Similar to Introduction to Open Source RAG and RAG Evaluation

Introduction to MySQL InnoDB Cluster
Introduction to MySQL InnoDB ClusterIntroduction to MySQL InnoDB Cluster
Introduction to MySQL InnoDB Cluster
Frederic Descamps
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
Yves Raimond
 
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
Neo4j
 
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Neo4j
 
Mysql NDB Cluster's Asynchronous Parallel Design for High Performance
Mysql NDB Cluster's Asynchronous Parallel Design for High PerformanceMysql NDB Cluster's Asynchronous Parallel Design for High Performance
Mysql NDB Cluster's Asynchronous Parallel Design for High Performance
Bernd Ocklin
 
[Heap con19] designing data intensive applications in serverless architecture
[Heap con19] designing data intensive applications in serverless architecture[Heap con19] designing data intensive applications in serverless architecture
[Heap con19] designing data intensive applications in serverless architecture
Nikolay Matvienko
 
Rails israel 2013
Rails israel 2013Rails israel 2013
Rails israel 2013
Reuven Lerner
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, Wix
Codemotion Tel Aviv
 
MySQL Document Store - when SQL & NoSQL live together... in peace!
MySQL Document Store - when SQL & NoSQL live together... in peace!MySQL Document Store - when SQL & NoSQL live together... in peace!
MySQL Document Store - when SQL & NoSQL live together... in peace!
Frederic Descamps
 
IRJET- Efficient Geometric Range Search on RTREE Occupying Encrypted Spatial ...
IRJET- Efficient Geometric Range Search on RTREE Occupying Encrypted Spatial ...IRJET- Efficient Geometric Range Search on RTREE Occupying Encrypted Spatial ...
IRJET- Efficient Geometric Range Search on RTREE Occupying Encrypted Spatial ...
IRJET Journal
 
Designing scalable application: from umbrella project to distributed system -...
Designing scalable application: from umbrella project to distributed system -...Designing scalable application: from umbrella project to distributed system -...
Designing scalable application: from umbrella project to distributed system -...
Elixir Club
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
MLconf
 
Managing your Black Friday Logs
Managing your Black Friday LogsManaging your Black Friday Logs
Managing your Black Friday Logs
J On The Beach
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
confluent
 
GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016
Joshua Bae
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Master the RETE algorithm
Master the RETE algorithmMaster the RETE algorithm
Master the RETE algorithm
Masahiko Umeno
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
MongoDB
 
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
Zilliz
 

Similar to Introduction to Open Source RAG and RAG Evaluation (20)

Introduction to MySQL InnoDB Cluster
Introduction to MySQL InnoDB ClusterIntroduction to MySQL InnoDB Cluster
Introduction to MySQL InnoDB Cluster
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
 
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
 
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
Scale Your Mission-Critical Applications With Neo4j Fabric and Clustering Arc...
 
Mysql NDB Cluster's Asynchronous Parallel Design for High Performance
Mysql NDB Cluster's Asynchronous Parallel Design for High PerformanceMysql NDB Cluster's Asynchronous Parallel Design for High Performance
Mysql NDB Cluster's Asynchronous Parallel Design for High Performance
 
[Heap con19] designing data intensive applications in serverless architecture
[Heap con19] designing data intensive applications in serverless architecture[Heap con19] designing data intensive applications in serverless architecture
[Heap con19] designing data intensive applications in serverless architecture
 
Rails israel 2013
Rails israel 2013Rails israel 2013
Rails israel 2013
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, Wix
 
MySQL Document Store - when SQL & NoSQL live together... in peace!
MySQL Document Store - when SQL & NoSQL live together... in peace!MySQL Document Store - when SQL & NoSQL live together... in peace!
MySQL Document Store - when SQL & NoSQL live together... in peace!
 
IRJET- Efficient Geometric Range Search on RTREE Occupying Encrypted Spatial ...
IRJET- Efficient Geometric Range Search on RTREE Occupying Encrypted Spatial ...IRJET- Efficient Geometric Range Search on RTREE Occupying Encrypted Spatial ...
IRJET- Efficient Geometric Range Search on RTREE Occupying Encrypted Spatial ...
 
Designing scalable application: from umbrella project to distributed system -...
Designing scalable application: from umbrella project to distributed system -...Designing scalable application: from umbrella project to distributed system -...
Designing scalable application: from umbrella project to distributed system -...
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Managing your Black Friday Logs
Managing your Black Friday LogsManaging your Black Friday Logs
Managing your Black Friday Logs
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
 
GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Master the RETE algorithm
Master the RETE algorithmMaster the RETE algorithm
Master the RETE algorithm
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
 
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
 

More from Zilliz

Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
MemGPT: Introduction to Memory Augmented Chat
MemGPT: Introduction to Memory Augmented ChatMemGPT: Introduction to Memory Augmented Chat
MemGPT: Introduction to Memory Augmented Chat
Zilliz
 
Copilot Workspace: What it is, how it works, why it matters
Copilot Workspace: What it is, how it works, why it mattersCopilot Workspace: What it is, how it works, why it matters
Copilot Workspace: What it is, how it works, why it matters
Zilliz
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AIKnowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
Zilliz
 
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
Zilliz
 
Advanced Retrieval Augmented Generation Techniques
Advanced Retrieval Augmented Generation TechniquesAdvanced Retrieval Augmented Generation Techniques
Advanced Retrieval Augmented Generation Techniques
Zilliz
 
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Zilliz
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Zilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Zilliz
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Zilliz
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
Zilliz
 
Zilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in MLZilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in ML
Zilliz
 
Integrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with FloomIntegrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with Floom
Zilliz
 

More from Zilliz (20)

Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
MemGPT: Introduction to Memory Augmented Chat
MemGPT: Introduction to Memory Augmented ChatMemGPT: Introduction to Memory Augmented Chat
MemGPT: Introduction to Memory Augmented Chat
 
Copilot Workspace: What it is, how it works, why it matters
Copilot Workspace: What it is, how it works, why it mattersCopilot Workspace: What it is, how it works, why it matters
Copilot Workspace: What it is, how it works, why it matters
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AIKnowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
 
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
 
Advanced Retrieval Augmented Generation Techniques
Advanced Retrieval Augmented Generation TechniquesAdvanced Retrieval Augmented Generation Techniques
Advanced Retrieval Augmented Generation Techniques
 
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Zilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in MLZilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in ML
 
Integrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with FloomIntegrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with Floom
 

Recently uploaded

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 

Recently uploaded (20)

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 

Introduction to Open Source RAG and RAG Evaluation

  • 1. 1 | © Copyright 11/17/23 Zilliz 1 | © Copyright 11/17/23 Zilliz 1 | © Copyright 11/17/23 Zilliz 1 | © Copyright 11/17/23 Zilliz Speaker Christy Bergman Developer Advocate, Zilliz christy.bergman@zilliz.com https://www.linkedin.com/in/christybergman/ https://github.com/milvus-io/milvus discord: https://discord.gg/FjCMmaJng6
  • 2. 2 | © Copyright 11/17/23 Zilliz 2 | © Copyright 11/17/23 Zilliz Image source: https://thedataquarry.com/posts/vector-db-1/
  • 3. 3 | © Copyright 11/17/23 Zilliz 3 | © Copyright 11/17/23 Zilliz 27K+ GitHub Stars 25M+ Downloads 250+ Contributors 2,600 + Forks Milvus is an open-source vector database for GenAI projects. Pip-install on your laptop, plug into popular AI dev tools, and push to production with a single line of code. Easy Setup Pip-install to start coding in a notebook within seconds. Reusable Code Write once, and deploy with one line of code into the production environment Integration Plug into OpenAI, Langchain, LlmaIndex, and many more Feature-rich Dense & sparse embeddings, filtering, reranking and beyond
  • 4. 4 | © Copyright 11/17/23 Zilliz 4 | © Copyright 11/17/23 Zilliz Zilliz Cloud is a fully-managed vector database built atop of OSS Milvus Open Source Flexible & Secure Deployment Enterprise features for production-ready Cardinal Search Engine & Use Case Optimized Compute Milvus completely re-engineered to be optimized Pipelines Connectors Model Library A streamlined unstructured data platform Stable Milvus versions are continuously deployed to Zilliz Cloud
  • 5. 5 | © Copyright 11/17/23 Zilliz 5 | © Copyright 11/17/23 Zilliz 5 | © Copyright 11/17/23 Zilliz 5 | © Copyright 11/17/23 Zilliz Milvus Open Source Self-Managed Milvus Discord Join our community github.com/milvus-io/milvus Getting Started with Vector Databases milvus.io/discord
  • 6. 6 | © Copyright 11/17/23 Zilliz 6 | © Copyright 11/17/23 Zilliz AGENDA 01 AI Hallucinations and RAG 03 04 RAG Evaluation Methods 02 4 Challenges Demo RAG 05 Demo Eval
  • 7. 7 | © Copyright 11/17/23 Zilliz 7 | © Copyright 11/17/23 Zilliz 01 AI Hallucinations and RAG
  • 10. Why do models hallucinate? • The reason LLMs hallucinate is because … • They are trained on sequences of words (tokens) Sample Data The hamster cabinet … !!@#%# … Monkey eats shark … trees in the moons…
  • 11. Vector Database Where do Vectors Come From? Unstructured Data Embeddings here Pre-trained Deep Learning Models Vectors
  • 12. Where do Vectors Come From? Unstructured Data Vectors
  • 13. Where do Vectors Come From? Unstructured Data Vectors Embedding model Generator Model or LLM
  • 14. Semantic Similarity Image from Sutor et al Woman = [0.3, 0.4] Queen = [0.3, 0.9] King = [0.5, 0.7] Woman = [0.3, 0.4] Queen = [0.3, 0.9] King = [0.5, 0.7] Man = [0.5, 0.2] Queen - Woman + Man = King Queen = [0.3, 0.9] - Woman = [0.3, 0.4] [0.0, 0.5] + Man = [0.5, 0.2] King = [0.5, 0.7] Man = [0.5, 0.2]
  • 15. 15 | © Copyright 11/17/23 Zilliz 15 | © Copyright 11/17/23 Zilliz Retrieval Augmented Generation (RAG) Your Data Embedding Model Vector Database Question Question + Context Search Gen AI Model Reliable Answers What is the default AUTOINDEX distance metric in Milvus Client? The default AUTOINDEX distance metric in Milvus Client is L2.
  • 16. 16 | © Copyright 11/17/23 Zilliz 16 | © Copyright 11/17/23 Zilliz 02 3 Challenges and Lessons Learned
  • 17. 17 | © Copyright 11/17/23 Zilliz 17 | © Copyright 11/17/23 Zilliz Pain Point #1: Choosing an Embedding Model https://huggingface.co/spaces/mteb/leaderboard
  • 18. 18 | © Copyright 11/17/23 Zilliz 18 | © Copyright 11/17/23 Zilliz Pain Point #1: Choosing an Embedding Model Creator Model Embedding Dim Context Length Use Case Tasks Open Source MTEB Score OpenAI text-embedding- 3-small 512-1536 8K Real-time Multilingual text chatbots No 62 (1536) 62 (512) OpenAI text-embedding- 3-large 256-3072 8K Real-time Multilingual text chatbots No 65 (3072) 62 (256) Matryoshka Representation Learning: https://arxiv.org/pdf/2205.13147v4.pdf
  • 19. 19 | © Copyright 11/17/23 Zilliz 19 | © Copyright 11/17/23 Zilliz Pain Point #2: Choosing an Index https://milvus.io/docs/index.md
  • 20. 20 | © Copyright 11/17/23 Zilliz 20 | © Copyright 11/17/23 Zilliz Pain Point #2: Choosing an Index ● In-memory ○ Floating point dense ■ Flat - The FLAT index is an exhaustive, brute-force approach that compares the query vector against every single vector in the dataset to find the nearest neighbors. Suitable for small datasets where perfect accuracy is required, and search latency is not of concern. ■ IVF_Flat - The IVF_FLAT (Inverted File FLAT) index is a quantization-based index that divides the vector space into clusters. During indexing, vectors are assigned to the nearest cluster centroid, and during search, only the vectors within the closest clusters to the query vector are compared. ■ HNSW - HNSW organizes vectors in a hierarchical, multi-layered graph, so search complexity is logarithmic. The basic idea is to separate nearest neighbours into layers in the graph where the top layer is the sparsest. The lowest layer forms the complete graph. Search is performed from top to bottom. ○ Floating point sparse - SPLADE, BGE-M3 ○ Binary ● On-disk - diskANN when your data is too large to fit in memory ● Hardware-optimized: GPU CAGRA, ARM,
  • 21. 21 | © Copyright 11/17/23 Zilliz 21 | © Copyright 11/17/23 Zilliz Pain Point #2: Choosing an Index IVF-Flat HNSW https://arxiv.org/abs/160 3.09320
  • 22. 22 | © Copyright 11/17/23 Zilliz 22 | © Copyright 11/17/23 Zilliz Conversation Data Documentation Data Lecture or Q/A Data Pain Point #3: Chunking
  • 23. 23 | © Copyright 11/17/23 Zilliz 23 | © Copyright 11/17/23 Zilliz Conversation Data Documentation Data Question Answer Data add conversation memory use Q&A pair formatting Pain Point #3: Chunking
  • 24. 24 | © Copyright 11/17/23 Zilliz 24 | © Copyright 11/17/23 Zilliz Pain Point #3: Chunks need more context Tesla Roadster 2018 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem 2023 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem Chunk #1 Chunk #2 Naive Chunks
  • 25. 25 | © Copyright 11/17/23 Zilliz 25 | © Copyright 11/17/23 Zilliz Tesla Roadster 2018 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem 2023 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem Tesla Roadster 2018 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem Tesla Roadster 2023 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem HTMLHeaderTextSplitter ParentDocumentRetriever Title 2-levels above Title 1-level above Naive Chunks Better Chunks HierarchicalNodeParser AutoMergingRetriever Pain Point #3: Chunks need more context
  • 26. 26 | © Copyright 11/17/23 Zilliz 26 | © Copyright 11/17/23 Zilliz Example
  • 27. 27 | © Copyright 11/17/23 Zilliz 27 | © Copyright 11/17/23 Zilliz Example
  • 28. 28 | © Copyright 11/17/23 Zilliz 28 | © Copyright 11/17/23 Zilliz Pain Point #4: Keyword or Semantic Search? �� Good for: ● Exact product name ● Jargon words Examples: ● Product name = “2022 RF GT 6MT” Good for: ● Similar meaning but maybe not exact Examples: ● Similar image search ● Related wiki articles
  • 29. 29 | © Copyright 11/17/23 Zilliz 29 | © Copyright 11/17/23 Zilliz Pain Point #4: Keyword or Semantic Search? Dense Vector Sparse Vector TF-IDF BM25 SPLADE Lucene WAND pruning BGE-M3 Top10 Top5 Final top_k Prompt & Question Improved context Best of both worlds! ● Reranked Keyword AND Semantic top_k ● Put reranked into the Prompt Context Keyword Search Semantic Search Linear comb. Cross-encoder Neural reranker
  • 30. 30 | © Copyright 11/17/23 Zilliz 30 | © Copyright 11/17/23 Zilliz Rerankers - when are they computed? - Straight up Cosine similarity is called no interaction. This is dense embeddings “semantic search”. - BERT was an Early Interaction model meaning relationship between question and docs are pre-computed as part of Embedding model, offline. - Cross-encoders are ML-model Late Interaction, calculated at query time. Too computation-heavy to run real-time except for small top_k to reduce to smaller top_2. Cross-encoder reranking (adds classifier to Q, A pairs). - ColBERT v2 is Neural-model Late Interaction calculated offline, before the user asks their question! ~2% increased accuracy, but requires storing extra embeddings. - Cohere’s rerank-3, claims ~26% improvement over sparse only; 6% over dense - Jina.ai Reranker, claims ~20% improvement over sparse only
  • 31. 31 | © Copyright 11/17/23 Zilliz 31 | © Copyright 11/17/23 Zilliz BERT vs ColBert BERT: SPLADE, BGE-M3 Query Top_k candidates Final top_k https://arxiv.org/pdf/2112.01488.pdf
  • 32. 32 | © Copyright 11/17/23 Zilliz 32 | © Copyright 11/17/23 Zilliz Colbert v2 Reranker https://arxiv.org/pdf/2112.01488.pdf
  • 33. 33 | © Copyright 11/17/23 Zilliz 33 | © Copyright 11/17/23 Zilliz Slide from Tengyu Ma, April 2024 talk at Unstructured Data (+add Milvus metadata filtering) Metadata filtering (hash)
  • 34. 34 | © Copyright 11/17/23 Zilliz 34 | © Copyright 11/17/23 Zilliz BGE M3-Embedding ● “Multi-vec” - Multi-vector retrieval, uses fine-grained interactions between query and passage’s embeddings to compute the relevance score. Re-rank the top-200 Dense candidates, for efficient processing. ● “Dense+Sparse” - Retrieve the top-1000 candidates with dense and sparse method; then re-rank using the sum of two scores. ● “All” - Re-rank based on the sum of all three scores. … Multi-lingual retrieval performance on the MIRACL dev set (measured by nDCG@10). https://arxiv.org/pdf/2402.03216
  • 35. 35 | © Copyright 11/17/23 Zilliz 35 | © Copyright 11/17/23 Zilliz https://chat.lmsys.org/?leaderboard chart by @maximelabonne
  • 36. 36 | © Copyright 11/17/23 Zilliz 36 | © Copyright 11/17/23 Zilliz
  • 37. 37 | © Copyright 11/17/23 Zilliz 37 | © Copyright 11/17/23 Zilliz Mixtral 8x22B-Instruct-v0.1 with Anyscale Endpoints https://console.anyscale.com/v2/playground
  • 38. 38 | © Copyright 11/17/23 Zilliz 38 | © Copyright 11/17/23 Zilliz Question: What do the parameters for HNSW mean? Prompt GPT-3.5-turbo Anyscale endpoints Mixtral-8x22B-Instruct-v0.1
  • 39. 39 | © Copyright 11/17/23 Zilliz 39 | © Copyright 11/17/23 Zilliz 2023 Lost-in-the-middle https://arxiv.org/pdf/2307.03172 2024 Needle-in-a-haystack experiments https://github.com/gkamradt/LLMTest_NeedleInAHaystack Is RAG dead?
  • 40. 40 | © Copyright 11/17/23 Zilliz 40 | © Copyright 11/17/23 Zilliz Is RAG dead? Needle in haystack experiments Slide from Lance Martin, Langchain https://blog.langchain.dev/multi-nee dle-in-a-haystack/
  • 41. 41 | © Copyright 11/17/23 Zilliz 41 | © Copyright 11/17/23 Zilliz 03 Demo Custom RAG
  • 42. 42 | © Copyright 11/17/23 Zilliz 42 | © Copyright 11/17/23 Zilliz 04 RAG Evaluation Methods
  • 43. Where do Vectors Come From? Unstructured Data Vectors
  • 44. Where do Vectors Come From? Unstructured Data Vectors Embedding model Generator Model or LLM
  • 45. 45 | © Copyright 11/17/23 Zilliz 45 | © Copyright 11/17/23 Zilliz Retrieval Augmented Generation (RAG) Your Data Embedding Model Vector Database Question Question + Context Search Gen AI Model Reliable Answers What is the default AUTOINDEX distance metric in Milvus? The default AUTOINDEX distance metric in Milvus is L2.
  • 46. 46 | © Copyright 11/17/23 Zilliz 46 | © Copyright 11/17/23 Zilliz Model Evals vs Production System Evals Your RAG system Arena Elo score
  • 47. 47 | © Copyright 11/17/23 Zilliz 47 | © Copyright 11/17/23 Zilliz RAG Evaluation Methods https://arxiv.org/pdf/2306.05685.pdf GPT-4 favors itself with a 10% higher win rate; Claude-v1 favors itself with a 25% higher win rate Open weight Prometheus-eval aligns with human judgments up to 85% as of May 2024.
  • 48. 48 | © Copyright 11/17/23 Zilliz 48 | © Copyright 11/17/23 Zilliz Known Problems with LLM-as-Judge https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG GPT-4 is not a good judge of comprehensiveness GPT-4 Matches Human judgements on Correctness & Readability
  • 49. 49 | © Copyright 11/17/23 Zilliz 49 | © Copyright 11/17/23 Zilliz Known Problems with LLM-as-Judge https://arxiv.org/pdf/2305.17926 AI scores max/min higher Humans score medians higher
  • 50. 50 | © Copyright 11/17/23 Zilliz 50 | © Copyright 11/17/23 Zilliz RAG Evaluation Methods https://github.com/explodinggradients/ragas faithfulness context_precision context_recall Query Context answer_relevancy Ground Truth Answer answer_correctness answer_similarity Response
  • 51. 51 | © Copyright 11/17/23 Zilliz 51 | © Copyright 11/17/23 Zilliz 03 Demo RAG Eval
  • 52. 52 | © Copyright 11/17/23 Zilliz 52 | © Copyright 11/17/23 Zilliz T H A N K Y O U 󰚥 We need your stars! https://github.com/milvus-io/milvus 💬Join our discord: https://discord.gg/FjCMmaJng6
  • 53. Open Source Zilliz Architecture