1 | © Copyright 8/16/23 Zilliz
1 | © Copyright 8/16/23 Zilliz
Stephen Batifol | Zilliz
Tirana Tech Meetup, July 4th
Build a RAG system - From
Beginners to Advanced
2 | © Copyright 8/16/23 Zilliz
2 | © Copyright 8/16/23 Zilliz
Stephen Batifol
Developer Advocate, Zilliz/ Milvus
stephen.batifol@zilliz.com
linkedin.com/in/stephen-batifol/
@stephenbtl
Speaker
3 | © Copyright 8/16/23 Zilliz
3 | © Copyright 8/16/23 Zilliz
27K+
GitHub
Stars
25M+
Downloads
250+
Contributors
2,600
+
Forks
Milvus is an open-source vector database for GenAI projects. pip install on your
laptop, plug into popular AI dev tools, and push to production with a single line of
code.
Easy Setup
pip install
pymilvus to start
coding in a notebook
within seconds.
Reusable Code
Write once, and
deploy with one line
of code into the
production
environment
Integration
Plug into OpenAI,
Langchain,
LlamaIndex, and
many more
Feature-rich
Dense & sparse
embeddings,
filtering, reranking
and beyond
4 | © Copyright 8/16/23 Zilliz
4 | © Copyright 8/16/23 Zilliz
Seamless integration with all popular AI toolkits
5 | © Copyright 8/16/23 Zilliz
5 | © Copyright 8/16/23 Zilliz
Well-connected in LLM infrastructure to enable RAG
use cases
Framework
Hardware
Infrastructure
Embedding Models LLMs
Software Infrastructure
Vector Database
6 | © Copyright 8/16/23 Zilliz
6 | © Copyright 8/16/23 Zilliz
7 | © Copyright 8/16/23 Zilliz
7 | © Copyright 8/16/23 Zilliz
7 | © Copyright 8/16/23 Zilliz
7 | © Copyright 8/16/23 Zilliz
01
Introduction to Vector DB
and Vector Search
8 | © Copyright 8/16/23 Zilliz
8 | © Copyright 8/16/23 Zilliz
Traditional database was built upon exact search
9 | © Copyright 8/16/23 Zilliz
9 | © Copyright 8/16/23 Zilliz
…which misses context, semantic meaning, and user intent
VS.
Apple
VS.
Rising dough
VS.
Change car tire
Rising Dough
Proofing Bread
✔
❌
10 | © Copyright 8/16/23 Zilliz
10 | © Copyright 8/16/23 Zilliz
…and cannot process increasingly growing unstructured data
*Data Source: The Digitization of the World by IDC
20%
Other
newly generated data in 2025
will be unstructured data
80%
11 | © Copyright 8/16/23 Zilliz
11 | © Copyright 8/16/23 Zilliz
Retrieval Augmented
Generation (RAG)
Expand LLMs' knowledge by
incorporating external data sources
into LLMs and your AI applications.
Match user behavior or content
features with other similar ones to
make effective recommendations.
Recommender System
Search for semantically similar
texts across vast amounts of
natural language documents.
Text/ Semantic Search
Image Similarity Search
Identify and search for visually
similar images or objects from a
vast collection of image libraries.
Video Similarity Search
Search for similar videos, scenes,
or objects from extensive
collections of video libraries.
Audio Similarity Search
Find similar audios in large datasets
for tasks like genre classification or
speech recognition
Molecular Similarity Search
Search for similar substructures,
superstructures, and other
structures for a specific molecule.
Anomaly Detection
Detect data points, events, and
observations that deviate
significantly from the usual pattern
Multimodal Similarity Search
Search over multiple types of data
simultaneously, e.g. text and
images
Common AI Use Cases
12 | © Copyright 8/16/23 Zilliz
12 | © Copyright 8/16/23 Zilliz
Vector
Databases
Where do Vectors Come From?
13 | © Copyright 8/16/23 Zilliz
13 | © Copyright 8/16/23 Zilliz
Embeddings Models
14 | © Copyright 8/16/23 Zilliz
14 | © Copyright 8/16/23 Zilliz
Vector Embedding
15 | © Copyright 8/16/23 Zilliz
15 | © Copyright 8/16/23 Zilliz
Vector Space
16 | © Copyright 8/16/23 Zilliz
16 | © Copyright 8/16/23 Zilliz
16 | © Copyright 8/16/23 Zilliz
16 | © Copyright 8/16/23 Zilliz
02
How do Vector Databases
Work?
17 | © Copyright 8/16/23 Zilliz
17 | © Copyright 8/16/23 Zilliz
How Similarity Search Works
Vn, 1
…
…
…
1
2
3
4
5
Transform into
Vectors
Unstructured Data
Images
User Generated
Content
Video
Documents
Audio
Vector Embeddings
Perform Approximate
Nearest Neighbor
Similarity Search
Perform Query
Get Results
Store in Vector Database
18 | © Copyright 8/16/23 Zilliz
18 | © Copyright 8/16/23 Zilliz
Example Entry
19 | © Copyright 8/16/23 Zilliz
19 | © Copyright 8/16/23 Zilliz
19 | © Copyright 8/16/23 Zilliz
19 | © Copyright 8/16/23 Zilliz
03 Indexes
20 | © Copyright 8/16/23 Zilliz
20 | © Copyright 8/16/23 Zilliz
Indexing strategies
• Tree based
• Graph based
• Hash based
• Cluster based
21 | © Copyright 8/16/23 Zilliz
21 | © Copyright 8/16/23 Zilliz
Category Index Accuracy Latency Throughput Index Time Cost
Graph-based Cagra (GPU) High Low Very High Fast Very High
HNSW High Low High Slow High
DiskANN High High Mid Very Slow Low
Quantization-base
d or cluster-based
ScaNN Mid Mid High Mid Mid
IVF_FLAT Mid Mid Low Fast Mid
IVF +
Quantization
Low Mid Mid Mid Low
22 | © Copyright 8/16/23 Zilliz
22 | © Copyright 8/16/23 Zilliz
23 | © Copyright 8/16/23 Zilliz
23 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
23
FLAT
24 | © Copyright 8/16/23 Zilliz
24 | © Copyright 8/16/23 Zilliz
FLAT Index
25 | © Copyright 8/16/23 Zilliz
25 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
25
Inverted File FLAT
(IVF-FLAT)
26 | © Copyright 8/16/23 Zilliz
26 | © Copyright 8/16/23 Zilliz
IVF-FLAT Index
27 | © Copyright 8/16/23 Zilliz
27 | © Copyright 8/16/23 Zilliz
IVF-FLAT Index
28 | © Copyright 8/16/23 Zilliz
28 | © Copyright 8/16/23 Zilliz
IVF-FLAT Index
29 | © Copyright 8/16/23 Zilliz
29 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
29
Hierarchical Navigable
Small World (HNSW)
30 | © Copyright 8/16/23 Zilliz
30 | © Copyright 8/16/23 Zilliz
HNSW - Skip List
31 | © Copyright 8/16/23 Zilliz
31 | © Copyright 8/16/23 Zilliz
• Built by randomly shuffling data points and inserting them one by
one, with each point connected to a predefined number of edges
(M).
⇒ Creates a graph structure that exhibits the "small world".
⇒ Any two points are connected through a relatively short path.
HNSW - NSW Graph
32 | © Copyright 8/16/23 Zilliz
32 | © Copyright 8/16/23 Zilliz
HNSW
33 | © Copyright 8/16/23 Zilliz
33 | © Copyright 8/16/23 Zilliz
Category Index Accuracy Latency Throughput Index Time Cost
Graph-based Cagra (GPU) High Low Very High Fast Very High
HNSW High Low High Slow High
DiskANN High High Mid Very Slow Low
Quantization-base
d or cluster-based
ScaNN Mid Mid High Mid Mid
IVF_FLAT Mid Mid Low Fast Mid
IVF +
Quantization
Low Mid Mid Mid Low
34 | © Copyright 8/16/23 Zilliz
34 | © Copyright 8/16/23 Zilliz
Picking an Index
● 100% Recall – Use FLAT search if you need 100% accuracy
● 10MB < index_size < 2GB – Standard IVF
● 2GB < index_size < 20GB – Consider PQ and HNSW
● 20GB < index_size < 200GB – Composite Index, IVF_PQ or
HNSW_SQ
● Disk-based indexes
35 | © Copyright 8/16/23 Zilliz
35 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
35
The GenAI Boom
36 | © Copyright 8/16/23 Zilliz
36 | © Copyright 8/16/23 Zilliz
37 | © Copyright 8/16/23 Zilliz
37 | © Copyright 8/16/23 Zilliz
Three Pillars of GenAI & the opportunities they bring
Models Computation Data
Vector Database
● Data Encryption
● Data ETL
● Data Security
● Data Pipeline
● Data Observability
● Data Compliance
38 | © Copyright 8/16/23 Zilliz
38 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
38
RAG
(Retrieval Augmented Generation)
39 | © Copyright 8/16/23 Zilliz
39 | © Copyright 8/16/23 Zilliz
Basic Idea
Use RAG to force the LLM to work with your data
by injecting it via a vector database like Milvus
40 | © Copyright 8/16/23 Zilliz
40 | © Copyright 8/16/23 Zilliz
Why RAG?
RAG vs. LLM
- Knowledge of LLM is out-of-date
- LLM can not get your private knowledge
- Help reduce Hallucinations
- Transparency and interpretability
RAG vs. Fine-tune
- Fine-tune is expensive
- Fine-tune spent much time
- RAG is pluggable
41 | © Copyright 8/16/23 Zilliz
41 | © Copyright 8/16/23 Zilliz
Basic RAG Architecture
42 | © Copyright 8/16/23 Zilliz
42 | © Copyright 8/16/23 Zilliz
5 lines starter
43 | © Copyright 8/16/23 Zilliz
43 | © Copyright 8/16/23 Zilliz
43 | © Copyright 8/16/23 Zilliz
43 | © Copyright 8/16/23 Zilliz
04 Basic RAG with Milvus
44 | © Copyright 8/16/23 Zilliz
44 | © Copyright 8/16/23 Zilliz
• Framework for building LLM Applications
• Focus on retrieving data and integrating with LLMs
• Loading the Data
• Chunk & Chunk Overlap
• Integrations with most AI popular tools
Llama-Index
45 | © Copyright 8/16/23 Zilliz
45 | © Copyright 8/16/23 Zilliz
Ollama
• Run LLMs Locally
• Embeddings Models
46 | © Copyright 8/16/23 Zilliz
46 | © Copyright 8/16/23 Zilliz
Milvus Lite
pip install pymilvus
47 | © Copyright 8/16/23 Zilliz
47 | © Copyright 8/16/23 Zilliz
47 | © Copyright 8/16/23 Zilliz
47 | © Copyright 8/16/23 Zilliz
05 Embeddings
48 | © Copyright 8/16/23 Zilliz
48 | © Copyright 8/16/23 Zilliz
Embeddings Models
49 | © Copyright 8/16/23 Zilliz
49 | © Copyright 8/16/23 Zilliz
Examining Embeddings
Picking a model
What to embed
Metadata
50 | © Copyright 8/16/23 Zilliz
50 | © Copyright 8/16/23 Zilliz
Embeddings Strategies
Level 1: Embedding Chunks Directly
Level 2: Embedding Sub and Super Chunks
Level 3: Incorporating Chunking and Non-Chunking Metadata
51 | © Copyright 8/16/23 Zilliz
51 | © Copyright 8/16/23 Zilliz
Metadata Examples
Chunking
- Paragraph position
- Section header
- Larger paragraph
- Sentence Number
- …
Non-Chunking
- Author
- Publisher
- Organization
- Role Based Access Control
- …
52 | © Copyright 8/16/23 Zilliz
52 | © Copyright 8/16/23 Zilliz
Your embeddings strategy depends on your accuracy,
cost, and use case needs
Takeaway:
53 | © Copyright 8/16/23 Zilliz
53 | © Copyright 8/16/23 Zilliz
53 | © Copyright 8/16/23 Zilliz
53 | © Copyright 8/16/23 Zilliz
06 Chunking
54 | © Copyright 8/16/23 Zilliz
54 | © Copyright 8/16/23 Zilliz
Chunking Considerations
Chunk Size
Chunk Overlap
Character Splitters
55 | © Copyright 8/16/23 Zilliz
55 | © Copyright 8/16/23 Zilliz
Chunk Size=50, Overlap=0
56 | © Copyright 8/16/23 Zilliz
56 | © Copyright 8/16/23 Zilliz
Chunk Size=128, Overlap=20
57 | © Copyright 8/16/23 Zilliz
57 | © Copyright 8/16/23 Zilliz
Chunk Size=256, Overlap=50
58 | © Copyright 8/16/23 Zilliz
58 | © Copyright 8/16/23 Zilliz
SemanticChunker
59 | © Copyright 8/16/23 Zilliz
59 | © Copyright 8/16/23 Zilliz
How Does Your Data Look?
Conversation
Data
Documentation
Data
Lecture or Q/A
Data
60 | © Copyright 8/16/23 Zilliz
60 | © Copyright 8/16/23 Zilliz
Your chunking strategy depends on what your data looks
like and what you need from it.
Takeaway:
61 | © Copyright 8/16/23 Zilliz
61 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
61
Demo!
62 | © Copyright 8/16/23 Zilliz
62 | © Copyright 8/16/23 Zilliz
Naive RAG is limited
63 | © Copyright 8/16/23 Zilliz
63 | © Copyright 8/16/23 Zilliz
Naive RAG failure mode
Summarization
64 | © Copyright 8/16/23 Zilliz
64 | © Copyright 8/16/23 Zilliz
Naive RAG failure mode
Implicit data
65 | © Copyright 8/16/23 Zilliz
65 | © Copyright 8/16/23 Zilliz
Naive RAG failure mode
Multi-part questions
66 | © Copyright 8/16/23 Zilliz
66 | © Copyright 8/16/23 Zilliz
66 | © Copyright 8/16/23 Zilliz
RAG is necessary but
not sufficient
67 | © Copyright 8/16/23 Zilliz
67 | © Copyright 8/16/23 Zilliz
Good dishes come from good ingredients
• Data collection
• Data cleaning
• Parsing & Chunking
68 | © Copyright 8/16/23 Zilliz
68 | © Copyright 8/16/23 Zilliz
68 | © Copyright 9/25/23 Zilliz
68 | © Copyright 9/25/23 Zilliz
Simplify and streamline
the conversion of
unstructured data into
state-of-the-art vector
embeddings, using
intuitive UI and Restful
APIs.
Pipelines
Easy. High-quality. Scalable.
Simplify the workflow
for developers, from
converting
unstructured data into
searchable vectors to
retrieving them from
vector databases
Deliver excellence in
every phase of vector
search pipeline
development and
deployment,
regardless of their
expertise
Ensure scalability for
managing large
datasets and
high-throughput
queries, maintaining
high performance with
min. customization or
infra changes
Zilliz Cloud Pipelines
69 | © Copyright 8/16/23 Zilliz
69 | © Copyright 8/16/23 Zilliz
Naive RAG Pipeline
⚠ Single-shot
⚠ No query understanding/planning
⚠ No tool use
⚠ No reflection, error correction
⚠ No memory (stateless)
70 | © Copyright 8/16/23 Zilliz
70 | © Copyright 8/16/23 Zilliz
First thing first
Measure it before you attempts to improve it!
71 | © Copyright 8/16/23 Zilliz
71 | © Copyright 8/16/23 Zilliz
71 | © Copyright 8/16/23 Zilliz
71 | © Copyright 8/16/23 Zilliz
07 Advanced RAG techniques
72 | © Copyright 8/16/23 Zilliz
72 | © Copyright 8/16/23 Zilliz
Types of RAG Enhancement Techniques
● Divide & Conquer
○ Query Enhancement: better express or process the query intent.
○ Indexing Enhancement: data cleanup, better parser and chunking
○ Retriever Enhancement: more retrievers and hybrid search strategy
● Thinking outside the box
○ Agents? Other tools than retriever?
73 | © Copyright 8/16/23 Zilliz
73 | © Copyright 8/16/23 Zilliz
73 | © Copyright 8/16/23 Zilliz
73 | © Copyright 8/16/23 Zilliz
Query Enhancement
a)
74 | © Copyright 8/16/23 Zilliz
74 | © Copyright 8/16/23 Zilliz
75 | © Copyright 8/16/23 Zilliz
75 | © Copyright 8/16/23 Zilliz
75 | © Copyright 8/16/23 Zilliz
75 | © Copyright 8/16/23 Zilliz
Indexing Enhancement
b)
76 | © Copyright 8/16/23 Zilliz
76 | © Copyright 8/16/23 Zilliz
77 | © Copyright 8/16/23 Zilliz
77 | © Copyright 8/16/23 Zilliz
77 | © Copyright 8/16/23 Zilliz
77 | © Copyright 8/16/23 Zilliz
Retriever Enhancement
c)
78 | © Copyright 8/16/23 Zilliz
78 | © Copyright 8/16/23 Zilliz
79 | © Copyright 8/16/23 Zilliz
79 | © Copyright 8/16/23 Zilliz
80 | © Copyright 8/16/23 Zilliz
80 | © Copyright 8/16/23 Zilliz
80 | © Copyright 8/16/23 Zilliz
80 | © Copyright 8/16/23 Zilliz
08 Agentic RAG
81 | © Copyright 8/16/23 Zilliz
81 | © Copyright 8/16/23 Zilliz
Agentic RAG
✅ Multi-turn
✅ Query / task planning layer
✅ Tool interface for external environment
✅ Reflection
✅ Memory for personalization
82 | © Copyright 8/16/23 Zilliz
82 | © Copyright 8/16/23 Zilliz
83 | © Copyright 8/16/23 Zilliz
83 | © Copyright 8/16/23 Zilliz
84 | © Copyright 8/16/23 Zilliz
84 | © Copyright 8/16/23 Zilliz
85 | © Copyright 8/16/23 Zilliz
85 | © Copyright 8/16/23 Zilliz
Conversation Memory
86 | © Copyright 8/16/23 Zilliz
86 | © Copyright 8/16/23 Zilliz
Tool Use
87 | © Copyright 8/16/23 Zilliz
87 | © Copyright 8/16/23 Zilliz
87 | © Copyright 8/16/23 Zilliz
87 | © Copyright 8/16/23 Zilliz
09
RAG in action with Milvus
Lite
88 | © Copyright 8/16/23 Zilliz
88 | © Copyright 8/16/23 Zilliz
General Ideas
89 | © Copyright 8/16/23 Zilliz
89 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
89
Demo!
90 | © Copyright 8/16/23 Zilliz
90 | © Copyright 8/16/23 Zilliz
milvus.io
github.com/milvus-io/
@milvusio
@stephenbtl
/in/stephen-batifol
Thank you
91 | © Copyright 8/16/23 Zilliz
91 | © Copyright 8/16/23 Zilliz
Meta Storage
Root Query Data Index
Coordinator Service
Proxy
Proxy
etcd
Log Broker
SDK
Load Balancer
DDL/DCL
DML
NOTIFICATION
CONTROL SIGNAL
Object Storage
Minio / S3 / AzureBlob
Log Snapshot Delta File Index File
Worker Node QUERY DATA DATA
Message Storage
VECTOR
DATABASE
Access Layer
Query Node Data Node Index Node
Milvus Architecture
92 | © Copyright 8/16/23 Zilliz
92 | © Copyright 8/16/23 Zilliz
Well-connected in LLM infrastructure to enable RAG
use cases
Framework
Hardware
Infrastructure
Embedding Models LLMs
Software Infrastructure
Vector Database
93 | © Copyright 8/16/23 Zilliz
93 | © Copyright 8/16/23 Zilliz
Driving AI Innovation with leading business
94 | © Copyright 8/16/23 Zilliz
94 | © Copyright 8/16/23 Zilliz

Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama

  • 1.
    1 | ©Copyright 8/16/23 Zilliz 1 | © Copyright 8/16/23 Zilliz Stephen Batifol | Zilliz Tirana Tech Meetup, July 4th Build a RAG system - From Beginners to Advanced
  • 2.
    2 | ©Copyright 8/16/23 Zilliz 2 | © Copyright 8/16/23 Zilliz Stephen Batifol Developer Advocate, Zilliz/ Milvus stephen.batifol@zilliz.com linkedin.com/in/stephen-batifol/ @stephenbtl Speaker
  • 3.
    3 | ©Copyright 8/16/23 Zilliz 3 | © Copyright 8/16/23 Zilliz 27K+ GitHub Stars 25M+ Downloads 250+ Contributors 2,600 + Forks Milvus is an open-source vector database for GenAI projects. pip install on your laptop, plug into popular AI dev tools, and push to production with a single line of code. Easy Setup pip install pymilvus to start coding in a notebook within seconds. Reusable Code Write once, and deploy with one line of code into the production environment Integration Plug into OpenAI, Langchain, LlamaIndex, and many more Feature-rich Dense & sparse embeddings, filtering, reranking and beyond
  • 4.
    4 | ©Copyright 8/16/23 Zilliz 4 | © Copyright 8/16/23 Zilliz Seamless integration with all popular AI toolkits
  • 5.
    5 | ©Copyright 8/16/23 Zilliz 5 | © Copyright 8/16/23 Zilliz Well-connected in LLM infrastructure to enable RAG use cases Framework Hardware Infrastructure Embedding Models LLMs Software Infrastructure Vector Database
  • 6.
    6 | ©Copyright 8/16/23 Zilliz 6 | © Copyright 8/16/23 Zilliz
  • 7.
    7 | ©Copyright 8/16/23 Zilliz 7 | © Copyright 8/16/23 Zilliz 7 | © Copyright 8/16/23 Zilliz 7 | © Copyright 8/16/23 Zilliz 01 Introduction to Vector DB and Vector Search
  • 8.
    8 | ©Copyright 8/16/23 Zilliz 8 | © Copyright 8/16/23 Zilliz Traditional database was built upon exact search
  • 9.
    9 | ©Copyright 8/16/23 Zilliz 9 | © Copyright 8/16/23 Zilliz …which misses context, semantic meaning, and user intent VS. Apple VS. Rising dough VS. Change car tire Rising Dough Proofing Bread ✔ ❌
  • 10.
    10 | ©Copyright 8/16/23 Zilliz 10 | © Copyright 8/16/23 Zilliz …and cannot process increasingly growing unstructured data *Data Source: The Digitization of the World by IDC 20% Other newly generated data in 2025 will be unstructured data 80%
  • 11.
    11 | ©Copyright 8/16/23 Zilliz 11 | © Copyright 8/16/23 Zilliz Retrieval Augmented Generation (RAG) Expand LLMs' knowledge by incorporating external data sources into LLMs and your AI applications. Match user behavior or content features with other similar ones to make effective recommendations. Recommender System Search for semantically similar texts across vast amounts of natural language documents. Text/ Semantic Search Image Similarity Search Identify and search for visually similar images or objects from a vast collection of image libraries. Video Similarity Search Search for similar videos, scenes, or objects from extensive collections of video libraries. Audio Similarity Search Find similar audios in large datasets for tasks like genre classification or speech recognition Molecular Similarity Search Search for similar substructures, superstructures, and other structures for a specific molecule. Anomaly Detection Detect data points, events, and observations that deviate significantly from the usual pattern Multimodal Similarity Search Search over multiple types of data simultaneously, e.g. text and images Common AI Use Cases
  • 12.
    12 | ©Copyright 8/16/23 Zilliz 12 | © Copyright 8/16/23 Zilliz Vector Databases Where do Vectors Come From?
  • 13.
    13 | ©Copyright 8/16/23 Zilliz 13 | © Copyright 8/16/23 Zilliz Embeddings Models
  • 14.
    14 | ©Copyright 8/16/23 Zilliz 14 | © Copyright 8/16/23 Zilliz Vector Embedding
  • 15.
    15 | ©Copyright 8/16/23 Zilliz 15 | © Copyright 8/16/23 Zilliz Vector Space
  • 16.
    16 | ©Copyright 8/16/23 Zilliz 16 | © Copyright 8/16/23 Zilliz 16 | © Copyright 8/16/23 Zilliz 16 | © Copyright 8/16/23 Zilliz 02 How do Vector Databases Work?
  • 17.
    17 | ©Copyright 8/16/23 Zilliz 17 | © Copyright 8/16/23 Zilliz How Similarity Search Works Vn, 1 … … … 1 2 3 4 5 Transform into Vectors Unstructured Data Images User Generated Content Video Documents Audio Vector Embeddings Perform Approximate Nearest Neighbor Similarity Search Perform Query Get Results Store in Vector Database
  • 18.
    18 | ©Copyright 8/16/23 Zilliz 18 | © Copyright 8/16/23 Zilliz Example Entry
  • 19.
    19 | ©Copyright 8/16/23 Zilliz 19 | © Copyright 8/16/23 Zilliz 19 | © Copyright 8/16/23 Zilliz 19 | © Copyright 8/16/23 Zilliz 03 Indexes
  • 20.
    20 | ©Copyright 8/16/23 Zilliz 20 | © Copyright 8/16/23 Zilliz Indexing strategies • Tree based • Graph based • Hash based • Cluster based
  • 21.
    21 | ©Copyright 8/16/23 Zilliz 21 | © Copyright 8/16/23 Zilliz Category Index Accuracy Latency Throughput Index Time Cost Graph-based Cagra (GPU) High Low Very High Fast Very High HNSW High Low High Slow High DiskANN High High Mid Very Slow Low Quantization-base d or cluster-based ScaNN Mid Mid High Mid Mid IVF_FLAT Mid Mid Low Fast Mid IVF + Quantization Low Mid Mid Mid Low
  • 22.
    22 | ©Copyright 8/16/23 Zilliz 22 | © Copyright 8/16/23 Zilliz
  • 23.
    23 | ©Copyright 8/16/23 Zilliz 23 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 23 FLAT
  • 24.
    24 | ©Copyright 8/16/23 Zilliz 24 | © Copyright 8/16/23 Zilliz FLAT Index
  • 25.
    25 | ©Copyright 8/16/23 Zilliz 25 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 25 Inverted File FLAT (IVF-FLAT)
  • 26.
    26 | ©Copyright 8/16/23 Zilliz 26 | © Copyright 8/16/23 Zilliz IVF-FLAT Index
  • 27.
    27 | ©Copyright 8/16/23 Zilliz 27 | © Copyright 8/16/23 Zilliz IVF-FLAT Index
  • 28.
    28 | ©Copyright 8/16/23 Zilliz 28 | © Copyright 8/16/23 Zilliz IVF-FLAT Index
  • 29.
    29 | ©Copyright 8/16/23 Zilliz 29 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 29 Hierarchical Navigable Small World (HNSW)
  • 30.
    30 | ©Copyright 8/16/23 Zilliz 30 | © Copyright 8/16/23 Zilliz HNSW - Skip List
  • 31.
    31 | ©Copyright 8/16/23 Zilliz 31 | © Copyright 8/16/23 Zilliz • Built by randomly shuffling data points and inserting them one by one, with each point connected to a predefined number of edges (M). ⇒ Creates a graph structure that exhibits the "small world". ⇒ Any two points are connected through a relatively short path. HNSW - NSW Graph
  • 32.
    32 | ©Copyright 8/16/23 Zilliz 32 | © Copyright 8/16/23 Zilliz HNSW
  • 33.
    33 | ©Copyright 8/16/23 Zilliz 33 | © Copyright 8/16/23 Zilliz Category Index Accuracy Latency Throughput Index Time Cost Graph-based Cagra (GPU) High Low Very High Fast Very High HNSW High Low High Slow High DiskANN High High Mid Very Slow Low Quantization-base d or cluster-based ScaNN Mid Mid High Mid Mid IVF_FLAT Mid Mid Low Fast Mid IVF + Quantization Low Mid Mid Mid Low
  • 34.
    34 | ©Copyright 8/16/23 Zilliz 34 | © Copyright 8/16/23 Zilliz Picking an Index ● 100% Recall – Use FLAT search if you need 100% accuracy ● 10MB < index_size < 2GB – Standard IVF ● 2GB < index_size < 20GB – Consider PQ and HNSW ● 20GB < index_size < 200GB – Composite Index, IVF_PQ or HNSW_SQ ● Disk-based indexes
  • 35.
    35 | ©Copyright 8/16/23 Zilliz 35 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 35 The GenAI Boom
  • 36.
    36 | ©Copyright 8/16/23 Zilliz 36 | © Copyright 8/16/23 Zilliz
  • 37.
    37 | ©Copyright 8/16/23 Zilliz 37 | © Copyright 8/16/23 Zilliz Three Pillars of GenAI & the opportunities they bring Models Computation Data Vector Database ● Data Encryption ● Data ETL ● Data Security ● Data Pipeline ● Data Observability ● Data Compliance
  • 38.
    38 | ©Copyright 8/16/23 Zilliz 38 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 38 RAG (Retrieval Augmented Generation)
  • 39.
    39 | ©Copyright 8/16/23 Zilliz 39 | © Copyright 8/16/23 Zilliz Basic Idea Use RAG to force the LLM to work with your data by injecting it via a vector database like Milvus
  • 40.
    40 | ©Copyright 8/16/23 Zilliz 40 | © Copyright 8/16/23 Zilliz Why RAG? RAG vs. LLM - Knowledge of LLM is out-of-date - LLM can not get your private knowledge - Help reduce Hallucinations - Transparency and interpretability RAG vs. Fine-tune - Fine-tune is expensive - Fine-tune spent much time - RAG is pluggable
  • 41.
    41 | ©Copyright 8/16/23 Zilliz 41 | © Copyright 8/16/23 Zilliz Basic RAG Architecture
  • 42.
    42 | ©Copyright 8/16/23 Zilliz 42 | © Copyright 8/16/23 Zilliz 5 lines starter
  • 43.
    43 | ©Copyright 8/16/23 Zilliz 43 | © Copyright 8/16/23 Zilliz 43 | © Copyright 8/16/23 Zilliz 43 | © Copyright 8/16/23 Zilliz 04 Basic RAG with Milvus
  • 44.
    44 | ©Copyright 8/16/23 Zilliz 44 | © Copyright 8/16/23 Zilliz • Framework for building LLM Applications • Focus on retrieving data and integrating with LLMs • Loading the Data • Chunk & Chunk Overlap • Integrations with most AI popular tools Llama-Index
  • 45.
    45 | ©Copyright 8/16/23 Zilliz 45 | © Copyright 8/16/23 Zilliz Ollama • Run LLMs Locally • Embeddings Models
  • 46.
    46 | ©Copyright 8/16/23 Zilliz 46 | © Copyright 8/16/23 Zilliz Milvus Lite pip install pymilvus
  • 47.
    47 | ©Copyright 8/16/23 Zilliz 47 | © Copyright 8/16/23 Zilliz 47 | © Copyright 8/16/23 Zilliz 47 | © Copyright 8/16/23 Zilliz 05 Embeddings
  • 48.
    48 | ©Copyright 8/16/23 Zilliz 48 | © Copyright 8/16/23 Zilliz Embeddings Models
  • 49.
    49 | ©Copyright 8/16/23 Zilliz 49 | © Copyright 8/16/23 Zilliz Examining Embeddings Picking a model What to embed Metadata
  • 50.
    50 | ©Copyright 8/16/23 Zilliz 50 | © Copyright 8/16/23 Zilliz Embeddings Strategies Level 1: Embedding Chunks Directly Level 2: Embedding Sub and Super Chunks Level 3: Incorporating Chunking and Non-Chunking Metadata
  • 51.
    51 | ©Copyright 8/16/23 Zilliz 51 | © Copyright 8/16/23 Zilliz Metadata Examples Chunking - Paragraph position - Section header - Larger paragraph - Sentence Number - … Non-Chunking - Author - Publisher - Organization - Role Based Access Control - …
  • 52.
    52 | ©Copyright 8/16/23 Zilliz 52 | © Copyright 8/16/23 Zilliz Your embeddings strategy depends on your accuracy, cost, and use case needs Takeaway:
  • 53.
    53 | ©Copyright 8/16/23 Zilliz 53 | © Copyright 8/16/23 Zilliz 53 | © Copyright 8/16/23 Zilliz 53 | © Copyright 8/16/23 Zilliz 06 Chunking
  • 54.
    54 | ©Copyright 8/16/23 Zilliz 54 | © Copyright 8/16/23 Zilliz Chunking Considerations Chunk Size Chunk Overlap Character Splitters
  • 55.
    55 | ©Copyright 8/16/23 Zilliz 55 | © Copyright 8/16/23 Zilliz Chunk Size=50, Overlap=0
  • 56.
    56 | ©Copyright 8/16/23 Zilliz 56 | © Copyright 8/16/23 Zilliz Chunk Size=128, Overlap=20
  • 57.
    57 | ©Copyright 8/16/23 Zilliz 57 | © Copyright 8/16/23 Zilliz Chunk Size=256, Overlap=50
  • 58.
    58 | ©Copyright 8/16/23 Zilliz 58 | © Copyright 8/16/23 Zilliz SemanticChunker
  • 59.
    59 | ©Copyright 8/16/23 Zilliz 59 | © Copyright 8/16/23 Zilliz How Does Your Data Look? Conversation Data Documentation Data Lecture or Q/A Data
  • 60.
    60 | ©Copyright 8/16/23 Zilliz 60 | © Copyright 8/16/23 Zilliz Your chunking strategy depends on what your data looks like and what you need from it. Takeaway:
  • 61.
    61 | ©Copyright 8/16/23 Zilliz 61 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 61 Demo!
  • 62.
    62 | ©Copyright 8/16/23 Zilliz 62 | © Copyright 8/16/23 Zilliz Naive RAG is limited
  • 63.
    63 | ©Copyright 8/16/23 Zilliz 63 | © Copyright 8/16/23 Zilliz Naive RAG failure mode Summarization
  • 64.
    64 | ©Copyright 8/16/23 Zilliz 64 | © Copyright 8/16/23 Zilliz Naive RAG failure mode Implicit data
  • 65.
    65 | ©Copyright 8/16/23 Zilliz 65 | © Copyright 8/16/23 Zilliz Naive RAG failure mode Multi-part questions
  • 66.
    66 | ©Copyright 8/16/23 Zilliz 66 | © Copyright 8/16/23 Zilliz 66 | © Copyright 8/16/23 Zilliz RAG is necessary but not sufficient
  • 67.
    67 | ©Copyright 8/16/23 Zilliz 67 | © Copyright 8/16/23 Zilliz Good dishes come from good ingredients • Data collection • Data cleaning • Parsing & Chunking
  • 68.
    68 | ©Copyright 8/16/23 Zilliz 68 | © Copyright 8/16/23 Zilliz 68 | © Copyright 9/25/23 Zilliz 68 | © Copyright 9/25/23 Zilliz Simplify and streamline the conversion of unstructured data into state-of-the-art vector embeddings, using intuitive UI and Restful APIs. Pipelines Easy. High-quality. Scalable. Simplify the workflow for developers, from converting unstructured data into searchable vectors to retrieving them from vector databases Deliver excellence in every phase of vector search pipeline development and deployment, regardless of their expertise Ensure scalability for managing large datasets and high-throughput queries, maintaining high performance with min. customization or infra changes Zilliz Cloud Pipelines
  • 69.
    69 | ©Copyright 8/16/23 Zilliz 69 | © Copyright 8/16/23 Zilliz Naive RAG Pipeline ⚠ Single-shot ⚠ No query understanding/planning ⚠ No tool use ⚠ No reflection, error correction ⚠ No memory (stateless)
  • 70.
    70 | ©Copyright 8/16/23 Zilliz 70 | © Copyright 8/16/23 Zilliz First thing first Measure it before you attempts to improve it!
  • 71.
    71 | ©Copyright 8/16/23 Zilliz 71 | © Copyright 8/16/23 Zilliz 71 | © Copyright 8/16/23 Zilliz 71 | © Copyright 8/16/23 Zilliz 07 Advanced RAG techniques
  • 72.
    72 | ©Copyright 8/16/23 Zilliz 72 | © Copyright 8/16/23 Zilliz Types of RAG Enhancement Techniques ● Divide & Conquer ○ Query Enhancement: better express or process the query intent. ○ Indexing Enhancement: data cleanup, better parser and chunking ○ Retriever Enhancement: more retrievers and hybrid search strategy ● Thinking outside the box ○ Agents? Other tools than retriever?
  • 73.
    73 | ©Copyright 8/16/23 Zilliz 73 | © Copyright 8/16/23 Zilliz 73 | © Copyright 8/16/23 Zilliz 73 | © Copyright 8/16/23 Zilliz Query Enhancement a)
  • 74.
    74 | ©Copyright 8/16/23 Zilliz 74 | © Copyright 8/16/23 Zilliz
  • 75.
    75 | ©Copyright 8/16/23 Zilliz 75 | © Copyright 8/16/23 Zilliz 75 | © Copyright 8/16/23 Zilliz 75 | © Copyright 8/16/23 Zilliz Indexing Enhancement b)
  • 76.
    76 | ©Copyright 8/16/23 Zilliz 76 | © Copyright 8/16/23 Zilliz
  • 77.
    77 | ©Copyright 8/16/23 Zilliz 77 | © Copyright 8/16/23 Zilliz 77 | © Copyright 8/16/23 Zilliz 77 | © Copyright 8/16/23 Zilliz Retriever Enhancement c)
  • 78.
    78 | ©Copyright 8/16/23 Zilliz 78 | © Copyright 8/16/23 Zilliz
  • 79.
    79 | ©Copyright 8/16/23 Zilliz 79 | © Copyright 8/16/23 Zilliz
  • 80.
    80 | ©Copyright 8/16/23 Zilliz 80 | © Copyright 8/16/23 Zilliz 80 | © Copyright 8/16/23 Zilliz 80 | © Copyright 8/16/23 Zilliz 08 Agentic RAG
  • 81.
    81 | ©Copyright 8/16/23 Zilliz 81 | © Copyright 8/16/23 Zilliz Agentic RAG ✅ Multi-turn ✅ Query / task planning layer ✅ Tool interface for external environment ✅ Reflection ✅ Memory for personalization
  • 82.
    82 | ©Copyright 8/16/23 Zilliz 82 | © Copyright 8/16/23 Zilliz
  • 83.
    83 | ©Copyright 8/16/23 Zilliz 83 | © Copyright 8/16/23 Zilliz
  • 84.
    84 | ©Copyright 8/16/23 Zilliz 84 | © Copyright 8/16/23 Zilliz
  • 85.
    85 | ©Copyright 8/16/23 Zilliz 85 | © Copyright 8/16/23 Zilliz Conversation Memory
  • 86.
    86 | ©Copyright 8/16/23 Zilliz 86 | © Copyright 8/16/23 Zilliz Tool Use
  • 87.
    87 | ©Copyright 8/16/23 Zilliz 87 | © Copyright 8/16/23 Zilliz 87 | © Copyright 8/16/23 Zilliz 87 | © Copyright 8/16/23 Zilliz 09 RAG in action with Milvus Lite
  • 88.
    88 | ©Copyright 8/16/23 Zilliz 88 | © Copyright 8/16/23 Zilliz General Ideas
  • 89.
    89 | ©Copyright 8/16/23 Zilliz 89 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 89 Demo!
  • 90.
    90 | ©Copyright 8/16/23 Zilliz 90 | © Copyright 8/16/23 Zilliz milvus.io github.com/milvus-io/ @milvusio @stephenbtl /in/stephen-batifol Thank you
  • 91.
    91 | ©Copyright 8/16/23 Zilliz 91 | © Copyright 8/16/23 Zilliz Meta Storage Root Query Data Index Coordinator Service Proxy Proxy etcd Log Broker SDK Load Balancer DDL/DCL DML NOTIFICATION CONTROL SIGNAL Object Storage Minio / S3 / AzureBlob Log Snapshot Delta File Index File Worker Node QUERY DATA DATA Message Storage VECTOR DATABASE Access Layer Query Node Data Node Index Node Milvus Architecture
  • 92.
    92 | ©Copyright 8/16/23 Zilliz 92 | © Copyright 8/16/23 Zilliz Well-connected in LLM infrastructure to enable RAG use cases Framework Hardware Infrastructure Embedding Models LLMs Software Infrastructure Vector Database
  • 93.
    93 | ©Copyright 8/16/23 Zilliz 93 | © Copyright 8/16/23 Zilliz Driving AI Innovation with leading business
  • 94.
    94 | ©Copyright 8/16/23 Zilliz 94 | © Copyright 8/16/23 Zilliz