1 | © Copyright 8/16/23 Zilliz
1 | © Copyright 8/16/23 Zilliz
Stephen Batifol | Zilliz
A Beginners Guide to Building
a RAG App Using Milvus
2 | © Copyright 8/16/23 Zilliz
2 | © Copyright 8/16/23 Zilliz
Stephen Batifol
Developer Advocate, Zilliz
stephen.batifol@zilliz.com
https://www.linkedin.com/in/stephen-batifol/
https://twitter.com/stephenbtl
Speaker
3 | © Copyright 8/16/23 Zilliz
3 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
3
RAG
(Retrieval Augmented Generation)
4 | © Copyright 8/16/23 Zilliz
4 | © Copyright 8/16/23 Zilliz
Basic Idea
Use RAG to force the LLM to work with your data
by injecting it via a vector database like Milvus
5 | © Copyright 8/16/23 Zilliz
5 | © Copyright 8/16/23 Zilliz
Vector DB for RAG
Vector Databases provide the ability to inject your data via
semantic similarity
Considerations include: scale, performance, and flexibility
6 | © Copyright 8/16/23 Zilliz
6 | © Copyright 8/16/23 Zilliz
LLMs are Stochastic
LLMs predict future tokens (a-la RNNs)
• “Milvus is the world ’s most popular vector ___”
• {“database”: 0.86, “search”: 0.11, “embedding”, 0.01,
…}
Downside: outdated input data could be cause for
hallucination
• Plausible-sounding but factually incorrect responses
7 | © Copyright 8/16/23 Zilliz
7 | © Copyright 8/16/23 Zilliz
Basic RAG Architecture
8 | © Copyright 8/16/23 Zilliz
8 | © Copyright 8/16/23 Zilliz
01 Tech Stack
9 | © Copyright 8/16/23 Zilliz
9 | © Copyright 8/16/23 Zilliz
Tech Stack
10 | © Copyright 8/16/23 Zilliz
10 | © Copyright 8/16/23 Zilliz
• Framework for building LLM Applications
• Focus on retrieving data and integrating with LLMs
• Loading the Data
• Chunk & Chunk Overlap
• Integrations with most popular tools
Langchain
11 | © Copyright 8/16/23 Zilliz
11 | © Copyright 8/16/23 Zilliz
Ollama
• Run quantized LLMs Locally
• Embeddings Models
12 | © Copyright 8/16/23 Zilliz
12 | © Copyright 8/16/23 Zilliz
Milvus
1. Cloud Native, Distributed System Architecture
2. True Separation of Concerns
3. Scalable Index Creation Strategy with 512 MB Segments
13 | © Copyright 8/16/23 Zilliz
13 | © Copyright 8/16/23 Zilliz
Embeddings Models
14 | © Copyright 8/16/23 Zilliz
14 | © Copyright 8/16/23 Zilliz
02 Embeddings
15 | © Copyright 8/16/23 Zilliz
15 | © Copyright 8/16/23 Zilliz
Examining Embeddings
Picking a model
What to embed
Metadata
16 | © Copyright 8/16/23 Zilliz
16 | © Copyright 8/16/23 Zilliz
Embeddings Strategies
Level 1: Embedding Chunks Directly
Level 2: Embedding Sub and Super Chunks
Level 3: Incorporating Chunking and Non-Chunking Metadata
17 | © Copyright 8/16/23 Zilliz
17 | © Copyright 8/16/23 Zilliz
Metadata Examples
Chunking
- Paragraph position
- Section header
- Larger paragraph
- Sentence Number
- …
Non-Chunking
- Author
- Publisher
- Organization
- Role Based Access Control
- …
18 | © Copyright 8/16/23 Zilliz
18 | © Copyright 8/16/23 Zilliz
Text:
“preferences of customers and prospective customers with respect to remote or hybrid
working, as a result of the COVID-19 pandemic, leading to a parallel delay, or potentially
permanent change, in receiving the corresponding revenue; •our projected financial
information, anticipated growth rate, and market opportunity; •our ability to maintain the
listing of our Class A Common Stock and Warrants on the NYSE; •our public securities’
potential liquidity and trading;”
Vector:
[-0.09975282847881317,-0.02853492833673954,-0.047886092215776443,0.01231582183
3908558,-0.004004416521638632,0.08756010979413986,0.013248161412775517,0.01070
4956017434597,-0.06194952502846718,0.021150749176740646,0.02453230880200863,0
.03979797288775444,-0.032914288341999054,-0.011855324730277061,...]
What your data looks like
19 | © Copyright 8/16/23 Zilliz
19 | © Copyright 8/16/23 Zilliz
Your embeddings strategy depends on your accuracy,
cost, and use case needs
Takeaway:
20 | © Copyright 8/16/23 Zilliz
20 | © Copyright 8/16/23 Zilliz
03 Chunking
21 | © Copyright 8/16/23 Zilliz
21 | © Copyright 8/16/23 Zilliz
Chunking Considerations
Chunk Size
Chunk Overlap
Character Splitters
22 | © Copyright 8/16/23 Zilliz
22 | © Copyright 8/16/23 Zilliz
Examples
Chunk Size=50, Overlap=0
23 | © Copyright 8/16/23 Zilliz
23 | © Copyright 8/16/23 Zilliz
Examples
Chunk Size=128, Overlap=20
24 | © Copyright 8/16/23 Zilliz
24 | © Copyright 8/16/23 Zilliz
Examples
Chunk Size=256, Overlap=50
25 | © Copyright 8/16/23 Zilliz
25 | © Copyright 8/16/23 Zilliz
Examples
SemanticChunker
26 | © Copyright 8/16/23 Zilliz
26 | © Copyright 8/16/23 Zilliz
How Does Your Data Look?
Conversation
Data
Documentation
Data
Lecture or Q/A
Data
27 | © Copyright 8/16/23 Zilliz
27 | © Copyright 8/16/23 Zilliz
Your chunking strategy depends on what your data looks
like and what you need from it.
Takeaway:
28 | © Copyright 8/16/23 Zilliz
28 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
28
Demo!
29 | © Copyright 8/16/23 Zilliz
29 | © Copyright 8/16/23 Zilliz
Questions?
Give Milvus a Star! Chat with me on Discord!
30 | © Copyright 8/16/23 Zilliz
30 | © Copyright 8/16/23 Zilliz
Meta Storage
Root Query Data Index
Coordinator Service
Proxy
Proxy
etcd
Log Broker
SDK
Load Balancer
DDL/DCL
DML
NOTIFICATION
CONTROL SIGNAL
Object Storage
Minio / S3 / AzureBlob
Log Snapshot Delta File Index File
Worker Node QUERY DATA DATA
Message Storage
VECTOR
DATABASE
Access Layer
Query Node Data Node Index Node
Milvus Architecture

A Beginners Guide to Building a RAG App Using Open Source Milvus

  • 1.
    1 | ©Copyright 8/16/23 Zilliz 1 | © Copyright 8/16/23 Zilliz Stephen Batifol | Zilliz A Beginners Guide to Building a RAG App Using Milvus
  • 2.
    2 | ©Copyright 8/16/23 Zilliz 2 | © Copyright 8/16/23 Zilliz Stephen Batifol Developer Advocate, Zilliz stephen.batifol@zilliz.com https://www.linkedin.com/in/stephen-batifol/ https://twitter.com/stephenbtl Speaker
  • 3.
    3 | ©Copyright 8/16/23 Zilliz 3 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 3 RAG (Retrieval Augmented Generation)
  • 4.
    4 | ©Copyright 8/16/23 Zilliz 4 | © Copyright 8/16/23 Zilliz Basic Idea Use RAG to force the LLM to work with your data by injecting it via a vector database like Milvus
  • 5.
    5 | ©Copyright 8/16/23 Zilliz 5 | © Copyright 8/16/23 Zilliz Vector DB for RAG Vector Databases provide the ability to inject your data via semantic similarity Considerations include: scale, performance, and flexibility
  • 6.
    6 | ©Copyright 8/16/23 Zilliz 6 | © Copyright 8/16/23 Zilliz LLMs are Stochastic LLMs predict future tokens (a-la RNNs) • “Milvus is the world ’s most popular vector ___” • {“database”: 0.86, “search”: 0.11, “embedding”, 0.01, …} Downside: outdated input data could be cause for hallucination • Plausible-sounding but factually incorrect responses
  • 7.
    7 | ©Copyright 8/16/23 Zilliz 7 | © Copyright 8/16/23 Zilliz Basic RAG Architecture
  • 8.
    8 | ©Copyright 8/16/23 Zilliz 8 | © Copyright 8/16/23 Zilliz 01 Tech Stack
  • 9.
    9 | ©Copyright 8/16/23 Zilliz 9 | © Copyright 8/16/23 Zilliz Tech Stack
  • 10.
    10 | ©Copyright 8/16/23 Zilliz 10 | © Copyright 8/16/23 Zilliz • Framework for building LLM Applications • Focus on retrieving data and integrating with LLMs • Loading the Data • Chunk & Chunk Overlap • Integrations with most popular tools Langchain
  • 11.
    11 | ©Copyright 8/16/23 Zilliz 11 | © Copyright 8/16/23 Zilliz Ollama • Run quantized LLMs Locally • Embeddings Models
  • 12.
    12 | ©Copyright 8/16/23 Zilliz 12 | © Copyright 8/16/23 Zilliz Milvus 1. Cloud Native, Distributed System Architecture 2. True Separation of Concerns 3. Scalable Index Creation Strategy with 512 MB Segments
  • 13.
    13 | ©Copyright 8/16/23 Zilliz 13 | © Copyright 8/16/23 Zilliz Embeddings Models
  • 14.
    14 | ©Copyright 8/16/23 Zilliz 14 | © Copyright 8/16/23 Zilliz 02 Embeddings
  • 15.
    15 | ©Copyright 8/16/23 Zilliz 15 | © Copyright 8/16/23 Zilliz Examining Embeddings Picking a model What to embed Metadata
  • 16.
    16 | ©Copyright 8/16/23 Zilliz 16 | © Copyright 8/16/23 Zilliz Embeddings Strategies Level 1: Embedding Chunks Directly Level 2: Embedding Sub and Super Chunks Level 3: Incorporating Chunking and Non-Chunking Metadata
  • 17.
    17 | ©Copyright 8/16/23 Zilliz 17 | © Copyright 8/16/23 Zilliz Metadata Examples Chunking - Paragraph position - Section header - Larger paragraph - Sentence Number - … Non-Chunking - Author - Publisher - Organization - Role Based Access Control - …
  • 18.
    18 | ©Copyright 8/16/23 Zilliz 18 | © Copyright 8/16/23 Zilliz Text: “preferences of customers and prospective customers with respect to remote or hybrid working, as a result of the COVID-19 pandemic, leading to a parallel delay, or potentially permanent change, in receiving the corresponding revenue; •our projected financial information, anticipated growth rate, and market opportunity; •our ability to maintain the listing of our Class A Common Stock and Warrants on the NYSE; •our public securities’ potential liquidity and trading;” Vector: [-0.09975282847881317,-0.02853492833673954,-0.047886092215776443,0.01231582183 3908558,-0.004004416521638632,0.08756010979413986,0.013248161412775517,0.01070 4956017434597,-0.06194952502846718,0.021150749176740646,0.02453230880200863,0 .03979797288775444,-0.032914288341999054,-0.011855324730277061,...] What your data looks like
  • 19.
    19 | ©Copyright 8/16/23 Zilliz 19 | © Copyright 8/16/23 Zilliz Your embeddings strategy depends on your accuracy, cost, and use case needs Takeaway:
  • 20.
    20 | ©Copyright 8/16/23 Zilliz 20 | © Copyright 8/16/23 Zilliz 03 Chunking
  • 21.
    21 | ©Copyright 8/16/23 Zilliz 21 | © Copyright 8/16/23 Zilliz Chunking Considerations Chunk Size Chunk Overlap Character Splitters
  • 22.
    22 | ©Copyright 8/16/23 Zilliz 22 | © Copyright 8/16/23 Zilliz Examples Chunk Size=50, Overlap=0
  • 23.
    23 | ©Copyright 8/16/23 Zilliz 23 | © Copyright 8/16/23 Zilliz Examples Chunk Size=128, Overlap=20
  • 24.
    24 | ©Copyright 8/16/23 Zilliz 24 | © Copyright 8/16/23 Zilliz Examples Chunk Size=256, Overlap=50
  • 25.
    25 | ©Copyright 8/16/23 Zilliz 25 | © Copyright 8/16/23 Zilliz Examples SemanticChunker
  • 26.
    26 | ©Copyright 8/16/23 Zilliz 26 | © Copyright 8/16/23 Zilliz How Does Your Data Look? Conversation Data Documentation Data Lecture or Q/A Data
  • 27.
    27 | ©Copyright 8/16/23 Zilliz 27 | © Copyright 8/16/23 Zilliz Your chunking strategy depends on what your data looks like and what you need from it. Takeaway:
  • 28.
    28 | ©Copyright 8/16/23 Zilliz 28 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 28 Demo!
  • 29.
    29 | ©Copyright 8/16/23 Zilliz 29 | © Copyright 8/16/23 Zilliz Questions? Give Milvus a Star! Chat with me on Discord!
  • 30.
    30 | ©Copyright 8/16/23 Zilliz 30 | © Copyright 8/16/23 Zilliz Meta Storage Root Query Data Index Coordinator Service Proxy Proxy etcd Log Broker SDK Load Balancer DDL/DCL DML NOTIFICATION CONTROL SIGNAL Object Storage Minio / S3 / AzureBlob Log Snapshot Delta File Index File Worker Node QUERY DATA DATA Message Storage VECTOR DATABASE Access Layer Query Node Data Node Index Node Milvus Architecture