1 | © Copyright 2024 Zilliz
1
Advanced Retrieval Augmented
Generation RAG Techniques
Tim Spann @ Zilliz
2 | © Copyright 2024 Zilliz
2
Slides
X
3 | © Copyright 2024 Zilliz
3
3 | © Copyright 10/22/23 Zilliz
3 | © Copyright 2024 Zilliz
Tim Spann
Principal Developer
Advocate, Zilliz
tim.spann@zilliz.com
https://www.linkedin.com/in/timothyspann/
https://x.com/PaaSDev
4 | © Copyright Zilliz
4
M A Quick Introduction to Milvus
5 | © Copyright 2024 Zilliz
5 | © Copyright 8/16/23 Zilliz
5
Mission:
Helping organizations make sense
of unstructured data.
2017
Founded
$113M
Raised
140
Employees
Redwood City, CA
Headquarters
6 | © Copyright 2024 Zilliz
6 | © Copyright 8/16/23 Zilliz
6
Milvus is an Open-Source Vector Database to
store, index, manage, and use the massive
number of embedding vectors generated by
deep neural networks and LLMs.
contributors
400
stars
29K
docker pulls
66M
forks
2.7K
+
Milvus: The most widely-adopted vector database
7 | © Copyright 2024 Zilliz
7
2024
Milvus Lite Milvus Standalone Milvus Distributed
● Ideal for prototyping,
small scale
experiments.
● Easy to set up and
use, pip instally
pymilvus
● Scale to ≈1M vectors
● Run on K8s
● Load balancer and
Multi-Node
Management
● Scaling of each
component
independently
● Scale to 100B
vectors
● Single-Node
Deployment
● Bundled in a single
Docker Image
● Supports Primary/
Secondary
● Scale up to 100M
vectors
Ready to scale 🚀
Write your code once, and run it everywhere, at scale!
● API and SDK are the same
8
Rich functionality
2024
9 | © Copyright 2024 Zilliz
9
https://zilliz.com/resources/analyst-report/zilliz-forrester-wave-vector-database-report
Zilliz Named a Leader in Vector Database Providers
Looking for the right Vector Database solution for your AI applications?
Choosing a vector database that allows you to efficiently manage and search
large-scale vector data for AI applications can be challenging.
Forrester assessed the most significant vector database providers in the market. Zilliz
stood out for its:
● Cloud-native scalability for handling massive vector datasets
● Lightning-fast search capabilities for real-time AI applications
● Robust open-source foundation with Milvus
● Exceptional technical support and reliability
Forrester notes that Zilliz "is at the forefront of innovation, delivering exceptional speed
and efficiency in vector processing and search to support real-time AI applications."
Find out why Forrester named Zilliz a Leader and how we can accelerate your AI
initiatives.
10 | © Copyright 2024 Zilliz
10
10 | © Copyright 8/16/23 Zilliz
10 | © Copyright 8/16/23 Zilliz
RAG Enhancements
RAG
11 | © Copyright 2024 Zilliz
11
Types of RAG Enhancement Techniques
● Divide & Conquer
○ Query Enhancement: better express or process the query intent.
○ Indexing Enhancement: data cleanup, better parser and chunking
○ Retriever Enhancement: more retrievers and hybrid search strategy
● Multimodal RAG
● Agentic RAG
● Thinking outside the box
○ Agents? Other tools than retriever?
12 | © Copyright 2024 Zilliz
12
Types of RAG Enhancement Techniques
https://medium.com/@zilliz_learn/multimodal-rag-expanding-beyond-text-for-smar
ter-ai-d3e30e2eaef0
https://zilliz.com/learn/top-10-best-multimodal-ai-models-you-should-know
https://huggingface.co/google/deplot
Vision Language Models (VLMs)
13 | © Copyright 2024 Zilliz
13
RIG
https://pub.towardsai.net/retrieval-interleaved-generation-rig-when-real-ti
me-data-retrieval-meets-response-generation-a33e44ddbd74
14 | © Copyright 2024 Zilliz
14
14 | © Copyright 8/16/23 Zilliz
14 | © Copyright 8/16/23 Zilliz
Query Enhancement
a)
15 | © Copyright 2024 Zilliz
15
16 | © Copyright 2024 Zilliz
16
16 | © Copyright 8/16/23 Zilliz
16 | © Copyright 8/16/23 Zilliz
Indexing Enhancement
b)
17 | © Copyright 2024 Zilliz
17
18 | © Copyright 2024 Zilliz
18
18 | © Copyright 8/16/23 Zilliz
18 | © Copyright 8/16/23 Zilliz
Retriever Enhancement
c)
19 | © Copyright 2024 Zilliz
19
20 | © Copyright 2024 Zilliz
20
21 | © Copyright 2024 Zilliz
21
21 | © Copyright 8/16/23 Zilliz
21 | © Copyright 8/16/23 Zilliz
Multimodal RAG
X
22 | © Copyright 2024 Zilliz
22
23 | © Copyright 2024 Zilliz
23
23 | © Copyright 8/16/23 Zilliz
23 | © Copyright 8/16/23 Zilliz
Agentic RAG
24 | © Copyright 2024 Zilliz
24
Agentic RAG
✅ Multi-turn
✅ Query / task planning layer
✅ Tool interface for external environment
✅ Reflection
✅ Memory for personalization
25 | © Copyright 2024 Zilliz
25
26 | © Copyright 2024 Zilliz
26
27 | © Copyright 2024 Zilliz
27
28 | © Copyright 2024 Zilliz
28
Conversation Memory
29 | © Copyright 2024 Zilliz
29
Tool Use
30 | © Copyright 2024 Zilliz
30
30 | © Copyright 8/16/23 Zilliz
30 | © Copyright 8/16/23 Zilliz
Local and Open
RAG in action with Milvus
Lite
31 | © Copyright 2024 Zilliz
31
General Ideas
32 | © Copyright 2024 Zilliz
32
Why RAG?
RAG vs. LLM
- Knowledge of LLM is out-of-date
- LLM can not get your private knowledge
- Help reduce Hallucinations
- Transparency and interpretability
RAG vs. Fine-tune
- Fine-tune is expensive
- Fine-tune spent much time
- RAG is pluggable
33 | © Copyright 2024 Zilliz
33
Ollama
• Run LLMs Locally
• Embeddings Models
DEMO!
34 | © Copyright 2024 Zilliz
34
Text Query
computer with this image
theme
35 | © Copyright 2024 Zilliz
35
36 | © Copyright 2024 Zilliz
36
37 | © Copyright 2024 Zilliz
37 | © Copyright 8/16/23 Zilliz
37
Advanced RAG
38 | © Copyright 2024 Zilliz
38
https://bit.ly/4eFdMlK https://bit.ly/3BLeLCx
39 | © Copyright 2024 Zilliz
39
https://bit.ly/3zXW8dX https://bit.ly/3NuK5ru
40 | © Copyright 2024 Zilliz
40
https://bit.ly/4gZ4Lpn
Metadata Filtering
Hybrid Search
Agents
https://bitz.ly/3UbqUqx
41 | © Copyright 2024 Zilliz
41
https://bit.ly/3YpKd1K
Smart Chunking
Embedding
Model Choice
Improving Information
Retrieval and RAG with
Hypothetical Document
Embeddings HyDE
https://bit.ly/4f8Ckne
42 | © Copyright 2024 Zilliz
42
Best Practices in RAG Apps
https://zilliz.com/blog/best-practice-in-implementing-rag-apps
43 | © Copyright 2024 Zilliz
43 | © Copyright 8/16/23 Zilliz
43
Deeper Dive
44 | © Copyright 2024 Zilliz
44
01 RAG Overview
45 | © Copyright 2024 Zilliz
45
Basic Idea
You want to use your data with a large
language model
46 | © Copyright 2024 Zilliz
46
RAG vs Fine Tuning
RAG
Inject your data via a vector
database like Milvus/Zilliz
Query LLM
Milvus
Your Data
Primary Use Case
- Factual Recall
- Forced Data Injection
- Cost Optimization
47 | © Copyright 2024 Zilliz
47
RAG vs Fine Tuning
LLM
Fine Tuning
Augment an LLM by training it
on your data
Your Data
“New” LLM
Query
Primary Use Case
- Style transfer
- Domain specific usage
48 | © Copyright 2024 Zilliz
48
Takeaway
Use RAG to force the LLM to work with your data by injecting it
via a vector database like Milvus or Zilliz
49 | © Copyright 2024 Zilliz
49
02 Chunking
50 | © Copyright 2024 Zilliz
50
Chunking Considerations
Chunk Size
Chunk Overlap
Character Splitters
51 | © Copyright 2024 Zilliz
51
How Does Your Data Look?
Conversation
Data
Documentation
Data
Lecture or Q/A
Data
52 | © Copyright 2024 Zilliz
52
Examples
53 | © Copyright 2024 Zilliz
53
Examples
54 | © Copyright 2024 Zilliz
54
Examples
55 | © Copyright 2024 Zilliz
55
Examples
56 | © Copyright 2024 Zilliz
56
Examples
57 | © Copyright 2024 Zilliz
57
Examples
58 | © Copyright 2024 Zilliz
58
Your chunking strategy depends on what your data looks
like and what you need from it.
Takeaway:
59 | © Copyright 2024 Zilliz
59
03 Embeddings
60 | © Copyright 2024 Zilliz
60
Examining Embeddings
Picking a model
What to embed
Metadata
61 | © Copyright 2024 Zilliz
61
Embeddings Models
62 | © Copyright 2024 Zilliz
62
Embeddings Strategies
Level 1 Embedding Chunks Directly
Level 2 Embedding Sub and Super Chunks
Level 3 Incorporating Chunking and Non-Chunking Metadata
63 | © Copyright 2024 Zilliz
63
Metadata Examples
Chunking
- Paragraph position
- Section header
- Larger paragraph
- Sentence Number
- …
Non-Chunking
- Author
- Publisher
- Organization
- Role Based Access Control
- …
64 | © Copyright 2024 Zilliz
64
Your embeddings strategy depends on your accuracy,
cost, and use case needs
Takeaway:
65 | © Copyright 2024 Zilliz
65
04 Vector Databases
66 | © Copyright 2024 Zilliz
66
Basic Idea
Vector Databases provide the ability to inject your data via
semantic similarity
Considerations include: scale, performance, and flexibility
67 | © Copyright 2024 Zilliz
67
Semantic Similarity
Image from Sutor et al
Woman = [0.3, 0.4]
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Woman = [0.3, 0.4]
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Man = [0.5, 0.2]
Queen - Woman + Man = King
Queen = [0.3, 0.9]
- Woman = [0.3, 0.4]
[0.0, 0.5]
+ Man = [0.5, 0.2]
King = [0.5, 0.7]
Man = [0.5, 0.2]
68 | © Copyright 2024 Zilliz
68
Vector
Databases
Where do Vectors Come From?
69 | © Copyright 2024 Zilliz
69
Basic Functionality
70 | © Copyright 2024 Zilliz
70
Why Not Use a SQL/NoSQL Database?
• Inefficiency in High-dimensional spaces
• Suboptimal Indexing
• Inadequate query support
• Lack of scalability
• Limited analytics capabilities
• Data conversion issues
TL;DR Vector operations are too computationally intensive for
traditional database infrastructures
71 | © Copyright 2024 Zilliz
71
Why Not Use a Vector Search Library?
• Have to manually implement filtering
• Not optimized to take advantage of the latest hardware
• Unable to handle large scale data
• Lack of lifecycle management
• Inefficient indexing capabilities
• No built in safety mechanisms
TL;DR Vector search libraries lack the infrastructure to help you scale,
deploy, and manage your apps in production.
72 | © Copyright 2024 Zilliz
72
What is Milvus/Zilliz ideal for?
• Advanced filtering
• Hybrid search
• Durability and backups
• Replications/High Availability
• Sharding
• Aggregations
• Lifecycle management
• Multi-tenancy
• High query load
• High insertion/deletion
• Full precision/recall
• Accelerator support GPU,
FPGA
• Billion-scale storage
Purpose-built to store, index and query vector embeddings from unstructured data at scale.
73 | © Copyright 2024 Zilliz
73
Meta Storage
Root Query Data Index
Coordinator Service
Proxy
Proxy
etcd
Log Broker
SDK
Load Balancer
DDL/DCL
DML
NOTIFICATION
CONTROL SIGNAL
Object Storage
Minio / S3 / AzureBlob
Log Snapshot Delta File Index File
Worker Node QUERY DATA DATA
Message Storage
Access Layer
Query Node Data Node Index Node
High-level overview of Milvusʼ Architecture
74 | © Copyright 2024 Zilliz
74
Milvus Architecture: Differentiation
1. Cloud Native, Distributed System Architecture
2. True Separation of Concerns
3. Scalable Index Creation Strategy with 512 MB Segments
75 | © Copyright Zilliz
75
W A New Data and Compute World
76 | © Copyright 2024 Zilliz
76 Data Source: The Digitization of the World by IDC
20%
Other
of newly generated data in
2025 will be unstructured data
90%
The world is much more than just text and keywords
77 | © Copyright 2024 Zilliz
77
What do these
companies that
navigated the "trough
of disillusionmentˮ
have in common?
Data Volumes.
AI Hype?
78 | © Copyright 2024 Zilliz
78
Well-connected in LLM infrastructure to enable RAG
use cases
Framework
Hardware
Infrastructure
Embedding Models LLMs
Software Infrastructure
Vector Database
79 | © Copyright 2024 Zilliz
79
New Hotness
https://zilliz.com/learn/top-10-best-multimodal-ai-models-you-should-know
https://github.com/facebookresearch/ImageBind
80 | © Copyright 2024 Zilliz
80
Vector vs Relational
https://zilliz.com/blog/relational-databases-vs-vector-databases
81 | © Copyright Zilliz
81
V Overview of Vector Databases
A New tool emerged. The Vector Database
Vn, 1
…
…
…
1
2
3
4
5
Transform into
Vectors
Unstructured Data
Images
User Generated
Content
Video
Documents
Audio
Vector Embeddings
Perform
Approximate
Nearest Neighbor
Similarity Search
Perform Query
Get Results
Store in Vector Database
How Similarity Search Works
8
4
Vector Database : making sense of unstructured data
2024
85 | © Copyright 2024 Zilliz
85
Vector Search
+ Indexing
+ Filtering
= 🚀
86 | © Copyright 2024 Zilliz
86
Use Case: Drug Discovery
Vectors: 12 Billion
Reqʼts: High Recall
Index: BIN_FLAT
Use Case: Data Search
Vectors: 2 Billion
Reqʼts: 200 ms, Cost mgmt
Index: DiskANN for cost savings
Use Case: Image Search
Vectors: 20 Billion
Reqʼts: High Insertion, Cost
Index: Disk Based Index
Use Case: Recommender System
Vectors: 20 Billion
Reqʼts: 5,000 QPS
Index: HNSW & CAGRA
Industry leaders already use vector search in their apps
87 | © Copyright Zilliz
87 | © Copyright Zilliz
87
Fast & Cost effective
3X faster, 3X
Cheaper
Pluggable Vector Search Lib
Tiered Storage
Scalable & Reliable
Cloud Native,
K8s Native
Scale from 1  10B
Storage / compute disaggregation
UNCOMPROMISING DATA
SECURITY
Enterprise Ready
Platform
Battle-Tested: Delivering Reliable
Performance and Enterprise-Grade
Security
AI Powered
Vector Native
Rich functionality for AI
Born for vector data processing
Thatʼs why we build Milvus And itʼs open sourced
under Apache license!
88 | © Copyright Zilliz
88
89 | © Copyright 2024 Zilliz
89
Sample of Milvus Users
90 | © Copyright 2024 Zilliz
90
Multi-modal Search
multimodal-demo.milvus.io
91 | © Copyright Zilliz
91
92 | © Copyright 2024 Zilliz
92
Seamless integration with all popular AI toolkits
93 | © Copyright 2024 Zilliz
93
https://zilliz.com/learn/milvus-notebooks
94 | © Copyright Zilliz
94 | © Copyright Zilliz
94
RESOURCES
95 | © Copyright Zilliz
95
Vector Database Resources
Give Milvus a Star! Chat with me on Discord!
https://github.com/milvus-io/milvus
96
Unstructured Data Meetup
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics
such as vector databases, LLMs, and managing data at scale. The intended audience of this group
includes roles like machine learning engineers, data scientists, data engineers, software engineers, and
PMs.
This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
97 | © Copyright Zilliz
97
https://zilliz.com/learn/generative-ai
98 | © Copyright 2024 Zilliz
98
98
This week in Milvus, Towhee, Attu, GPT
Cache, Gen AI, LLM, Apache NiFi, Apache
Flink, Apache Kafka, ML, AI, Apache Spark,
Apache Iceberg, Python, Java, Vector DB
and Open Source friends.
https://bit.ly/32dAJft
https://github.com/milvus-io/milvus
AIM Weekly by Tim Spann
99 | © Copyright 2024 Zilliz
99
milvus.io
github.com/milvus-io/
@milvusio
@paasDev
/in/timothyspann
Connect with me!
Thank you!
100 | © Copyright Zilliz
100
Milvus 🤝 Open-Source
MINIO
Store Vectors and Indexes
Enables Milvus’ stateless
architecture
Kafka/ Pulsar
Handles Data Insertion
stream
Internal Component
Communications
Real-time updates to
Milvus
Prometheus /
Grafana
Collects metrics from
Milvus
Provides real-time
monitoring dashboards
Kubernetes
Milvus Operator CRDs
101 | © Copyright Zilliz
101
Distributed
Architecture
102 | © Copyright Zilliz
102
Dynamic Scaling 🚀
Stateless components
for Easy Scaling
Data sharding across
multiple nodes
Horizontal Pod
Autoscaler (HPA)
● Query, Index, and
Data Nodes can be
scaled
independently
● Allows for optimized
resource allocation
based on workload
characteristics
● Distributes large
datasets across
multiple Data Nodes
● Enables parallel
processing for
improved query
performance
● Automatically
scales up and down
● Custom metrics can
be used (e.g., query
latency, throughput)
103 | © Copyright Zilliz
103
Stateless Architecture
Stateless Components All Milvus components are deployed Stateless.
Object Storage
Milvus relies on Object Storage (MinIO, S3, etc) for data
persistence.
Vectors are stored in Object Storage, Metadata is in etcd.
Scaling and Failover
Scaling and failover don't involve traditional data rebalancing.
When new pods are added or existing ones fail, they can
immediately start handling requests by accessing data from the
shared object storage.
104 | © Copyright Zilliz
104
Different Consistency levels
Trade Offs
● Strong: Guaranteed up-to-date
reads, highest latency
● Bounded: Reads may be slightly
stale, but within a time bound
● Session: Consistent reads within a
session, may be stale across
sessions
● Eventually: Lowest latency, reads
may be stale
Ensures every node or replica has the
same view of data at a given time.
● Strong consistency for critical
applications requiring accurate
results
● Eventually consistency for
high-throughput,
latency-sensitive apps
105 | © Copyright Zilliz
105
Growing Segment:
• In-memory segment replaying data
from the Log Broker.
• Uses a FLAT index to ensure data is
fresh and appendable.
Sealed Segment:
• Immutable segment using
alternative indexing methods for
efficiency.
Milvus Data Layout - Segments
106 | © Copyright Zilliz
106
Index Building
To avoid frequent index building
for data updates.
A collection in Milvus is divided
further into segments, each with
its own index.
107 | © Copyright Zilliz
107
Picking an Index
● 100% Recall – Use FLAT search if you need 100% accuracy
● 10MB < index_size < 2GB  Standard IVF
● 2GB < index_size < 20GB  Consider PQ and HNSW
● 20GB < index_size < 200GB  Composite Index, IVF_PQ or
HNSW_SQ
● Disk-based indexes
10
8
2024
Indexes
Most of the vector index types supported by Milvus use approximate nearest neighbors search ANNS,
● HNSW: HNSW is a graph-based index and is best suited for scenarios that have a high demand for
search efficiency. There is also a GPU version GPU_CAGRA, thanks to Nvidiaʼs contribution.
● FLAT: FLAT is best suited for scenarios that seek perfectly accurate and exact search results on a small,
million-scale dataset. There is also a GPU version GPU_BRUTE_FORCE.
● IVF_FLAT: IVF_FLAT is a quantization-based index and is best suited for scenarios that seek an ideal
balance between accuracy and query speed. There is also a GPU version GPU_IVF_FLAT.
● IVF_SQ8: IVF_SQ8 is a quantization-based index and is best suited for scenarios that seek a significant
reduction on disk, CPU, and GPU memory consumption as these resources are very limited.
● IVF_PQ: IVF_PQ is a quantization-based index and is best suited for scenarios that seek high query
speed even at the cost of accuracy. There is also a GPU version GPU_IVF_PQ.
10
9
2024
Indexes Continued.
● SCANN: SCANN is similar to IVF_PQ in terms of vector clustering and product quantization. What makes
them different lies in the implementation details of product quantization and the use of SIMD
Single-Instruction / Multi-data) for efficient calculation.
● DiskANN: Based on Vamana graphs, DiskANN powers efficient searches within large datasets.
New Stuff
New Stuff
https://github.com/milvus-io/milvus-sdk-java/releases/tag/v2.4.4
Milvus 2.4 introduces several new features and improvements:
1. New GPU Index - CAGRA: This GPU-based index offers significant performance improvements, especially for batch
searches
2. Multi-vector and Hybrid Search: This feature allows storing vector embeddings from multiple models and conducting
hybrid searches.
3. Sparse Vectors Support (Beta): Milvus now supports sparse vectors for processing in collections, which is
particularly useful for keyword interpretation and analysis
4. Grouping Search: This feature enhances document-level recall for Retrieval-Augmented Generation (RAG)
applications by providing categorical aggregation
5. Inverted Index and Fuzzy Matching: These capabilities improve keyword retrieval for scalar fields
6. Float16 and BF16 Vector Data Type Support: Milvus now supports these half-precision data types for vector fields,
which can improve query efficiency and reduce memory usage.
7. L0 Segment: This new segment is designed to record deleted data, enhancing the performance of delete and upsert
operations.
8. Refactored BulkInsert: The bulk-insert logic has been improved, allowing for importing multiple files in a single
bulk-insert request.
112 | © Copyright Zilliz
112
Hybrid Search
113 | © Copyright 2024 Zilliz
113 | © Copyright 9/25/23 Zilliz
113
Inverted File FLAT
IVFFLAT
114 | © Copyright 2024 Zilliz
114
IVFFLAT Index
115 | © Copyright 2024 Zilliz
115
IVFFLAT Index
116 | © Copyright 2024 Zilliz
116
IVFFLAT Index
117 | © Copyright 2024 Zilliz
117 | © Copyright 9/25/23 Zilliz
117
Hierarchical Navigable
Small World HNSW
118 | © Copyright 2024 Zilliz
118
HNSW  Skip List
119 | © Copyright 2024 Zilliz
119
• Built by randomly shuffling data points and inserting them one by
one, with each point connected to a predefined number of edges
M.
⇒ Creates a graph structure that exhibits the "small world".
⇒ Any two points are connected through a relatively short path.
HNSW  NSW Graph
120 | © Copyright 2024 Zilliz
120
HNSW
121 | © Copyright 2024 Zilliz
121 | © Copyright 8/16/23 Zilliz
121
Filtering
122 | © Copyright 2024 Zilliz
122
Filtering on Metadata
● Search Space Reduction w/ Pre-Filtering
● Bitset Wizardry 🧙
○ Use Compact Bitsets to represent Filter Matches
○ Low-level CPU operations for speed
● Scalar Indexing
○ Bloom Filter
○ Hash
○ Tree-based
123 | © Copyright 2024 Zilliz
123
123 | © Copyright 8/16/23 Zilliz
RAG is necessary but
not sufficient
124 | © Copyright 2024 Zilliz
124
Good dishes come from good ingredients
• Data collection
• Data cleaning
• Parsing & Chunking
125 | © Copyright 2024 Zilliz
125
125 | © Copyright 9/25/23 Zilliz
125 | © Copyright 9/25/23 Zilliz
Simplify and streamline
the conversion of
unstructured data into
state-of-the-art vector
embeddings, using
intuitive UI and Restful
APIs.
Pipelines
Easy. High-quality. Scalable.
Simplify the workflow
for developers, from
converting
unstructured data into
searchable vectors to
retrieving them from
vector databases
Deliver excellence in
every phase of vector
search pipeline
development and
deployment,
regardless of their
expertise
Ensure scalability for
managing large
datasets and
high-throughput
queries, maintaining
high performance with
min. customization or
infra changes
Zilliz Cloud Pipelines
126 | © Copyright 2024 Zilliz
126
Naive RAG Pipeline
⚠ Single-shot
⚠ No query understanding/planning
⚠ No tool use
⚠ No reflection, error correction
⚠ No memory (stateless)
127 | © Copyright 2024 Zilliz
127
First thing first
Measure it before you attempts to improve it!
128 | © Copyright 2024 Zilliz
128
128 | © Copyright 8/16/23 Zilliz
128 | © Copyright 8/16/23 Zilliz
07 Advanced RAG techniques
129 | © Copyright 2024 Zilliz
129
Types of RAG Enhancement Techniques
● Divide & Conquer
○ Query Enhancement: better express or process the query intent.
○ Indexing Enhancement: data cleanup, better parser and chunking
○ Retriever Enhancement: more retrievers and hybrid search strategy
● Thinking outside the box
○ Agents? Other tools than retriever?
130 | © Copyright 2024 Zilliz
130
130 | © Copyright 8/16/23 Zilliz
130 | © Copyright 8/16/23 Zilliz
Query Enhancement
a)
131 | © Copyright 2024 Zilliz
131
132 | © Copyright 2024 Zilliz
132
132 | © Copyright 8/16/23 Zilliz
132 | © Copyright 8/16/23 Zilliz
Indexing Enhancement
b)
133 | © Copyright 2024 Zilliz
133
134 | © Copyright 2024 Zilliz
134
134 | © Copyright 8/16/23 Zilliz
134 | © Copyright 8/16/23 Zilliz
Retriever Enhancement
c)
135 | © Copyright 2024 Zilliz
135
136 | © Copyright 2024 Zilliz
136
137 | © Copyright 2024 Zilliz
137
137 | © Copyright 8/16/23 Zilliz
137 | © Copyright 8/16/23 Zilliz
08 Agentic RAG
138 | © Copyright 2024 Zilliz
138
Agentic RAG
✅ Multi-turn
✅ Query / task planning layer
✅ Tool interface for external environment
✅ Reflection
✅ Memory for personalization
139 | © Copyright 2024 Zilliz
139
140 | © Copyright 2024 Zilliz
140
141 | © Copyright 2024 Zilliz
141
142 | © Copyright 2024 Zilliz
142
Conversation Memory
143 | © Copyright 2024 Zilliz
143
Tool Use
144 | © Copyright 2024 Zilliz
144
Fivetran Source Connector
Simplify unstructured data retrievals from 500+ system through Fivetran
500+ data sources
Connectors &
Transformation
Logics
Milvus source
connector Fivetran
SDK
Unstructured
Data
Embedding Services
Vectorization
Vector
Ingestion
Semantic
Search
145 | © Copyright 2024 Zilliz
145
146 | © Copyright 2024 Zilliz
146
https://milvus.io/docs/integrations_overview.md
147 | © Copyright 2024 Zilliz
147
https://zilliz.com/learn/milvus-notebooks
148 | © Copyright 2024 Zilliz
148
148 | © Copyright 10/22/23 Zilliz
148 | © Copyright 2024 Zilliz
Milvus
Open Source Self-Managed
Zilliz Cloud
SaaS Fully-Managed
github.com/milvus-io/milvus
Getting Started with Vector Databases
zilliz.com/cloud

2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Techniques

  • 1.
    1 | ©Copyright 2024 Zilliz 1 Advanced Retrieval Augmented Generation RAG Techniques Tim Spann @ Zilliz
  • 2.
    2 | ©Copyright 2024 Zilliz 2 Slides X
  • 3.
    3 | ©Copyright 2024 Zilliz 3 3 | © Copyright 10/22/23 Zilliz 3 | © Copyright 2024 Zilliz Tim Spann Principal Developer Advocate, Zilliz tim.spann@zilliz.com https://www.linkedin.com/in/timothyspann/ https://x.com/PaaSDev
  • 4.
    4 | ©Copyright Zilliz 4 M A Quick Introduction to Milvus
  • 5.
    5 | ©Copyright 2024 Zilliz 5 | © Copyright 8/16/23 Zilliz 5 Mission: Helping organizations make sense of unstructured data. 2017 Founded $113M Raised 140 Employees Redwood City, CA Headquarters
  • 6.
    6 | ©Copyright 2024 Zilliz 6 | © Copyright 8/16/23 Zilliz 6 Milvus is an Open-Source Vector Database to store, index, manage, and use the massive number of embedding vectors generated by deep neural networks and LLMs. contributors 400 stars 29K docker pulls 66M forks 2.7K + Milvus: The most widely-adopted vector database
  • 7.
    7 | ©Copyright 2024 Zilliz 7 2024 Milvus Lite Milvus Standalone Milvus Distributed ● Ideal for prototyping, small scale experiments. ● Easy to set up and use, pip instally pymilvus ● Scale to ≈1M vectors ● Run on K8s ● Load balancer and Multi-Node Management ● Scaling of each component independently ● Scale to 100B vectors ● Single-Node Deployment ● Bundled in a single Docker Image ● Supports Primary/ Secondary ● Scale up to 100M vectors Ready to scale 🚀 Write your code once, and run it everywhere, at scale! ● API and SDK are the same
  • 8.
  • 9.
    9 | ©Copyright 2024 Zilliz 9 https://zilliz.com/resources/analyst-report/zilliz-forrester-wave-vector-database-report Zilliz Named a Leader in Vector Database Providers Looking for the right Vector Database solution for your AI applications? Choosing a vector database that allows you to efficiently manage and search large-scale vector data for AI applications can be challenging. Forrester assessed the most significant vector database providers in the market. Zilliz stood out for its: ● Cloud-native scalability for handling massive vector datasets ● Lightning-fast search capabilities for real-time AI applications ● Robust open-source foundation with Milvus ● Exceptional technical support and reliability Forrester notes that Zilliz "is at the forefront of innovation, delivering exceptional speed and efficiency in vector processing and search to support real-time AI applications." Find out why Forrester named Zilliz a Leader and how we can accelerate your AI initiatives.
  • 10.
    10 | ©Copyright 2024 Zilliz 10 10 | © Copyright 8/16/23 Zilliz 10 | © Copyright 8/16/23 Zilliz RAG Enhancements RAG
  • 11.
    11 | ©Copyright 2024 Zilliz 11 Types of RAG Enhancement Techniques ● Divide & Conquer ○ Query Enhancement: better express or process the query intent. ○ Indexing Enhancement: data cleanup, better parser and chunking ○ Retriever Enhancement: more retrievers and hybrid search strategy ● Multimodal RAG ● Agentic RAG ● Thinking outside the box ○ Agents? Other tools than retriever?
  • 12.
    12 | ©Copyright 2024 Zilliz 12 Types of RAG Enhancement Techniques https://medium.com/@zilliz_learn/multimodal-rag-expanding-beyond-text-for-smar ter-ai-d3e30e2eaef0 https://zilliz.com/learn/top-10-best-multimodal-ai-models-you-should-know https://huggingface.co/google/deplot Vision Language Models (VLMs)
  • 13.
    13 | ©Copyright 2024 Zilliz 13 RIG https://pub.towardsai.net/retrieval-interleaved-generation-rig-when-real-ti me-data-retrieval-meets-response-generation-a33e44ddbd74
  • 14.
    14 | ©Copyright 2024 Zilliz 14 14 | © Copyright 8/16/23 Zilliz 14 | © Copyright 8/16/23 Zilliz Query Enhancement a)
  • 15.
    15 | ©Copyright 2024 Zilliz 15
  • 16.
    16 | ©Copyright 2024 Zilliz 16 16 | © Copyright 8/16/23 Zilliz 16 | © Copyright 8/16/23 Zilliz Indexing Enhancement b)
  • 17.
    17 | ©Copyright 2024 Zilliz 17
  • 18.
    18 | ©Copyright 2024 Zilliz 18 18 | © Copyright 8/16/23 Zilliz 18 | © Copyright 8/16/23 Zilliz Retriever Enhancement c)
  • 19.
    19 | ©Copyright 2024 Zilliz 19
  • 20.
    20 | ©Copyright 2024 Zilliz 20
  • 21.
    21 | ©Copyright 2024 Zilliz 21 21 | © Copyright 8/16/23 Zilliz 21 | © Copyright 8/16/23 Zilliz Multimodal RAG X
  • 22.
    22 | ©Copyright 2024 Zilliz 22
  • 23.
    23 | ©Copyright 2024 Zilliz 23 23 | © Copyright 8/16/23 Zilliz 23 | © Copyright 8/16/23 Zilliz Agentic RAG
  • 24.
    24 | ©Copyright 2024 Zilliz 24 Agentic RAG ✅ Multi-turn ✅ Query / task planning layer ✅ Tool interface for external environment ✅ Reflection ✅ Memory for personalization
  • 25.
    25 | ©Copyright 2024 Zilliz 25
  • 26.
    26 | ©Copyright 2024 Zilliz 26
  • 27.
    27 | ©Copyright 2024 Zilliz 27
  • 28.
    28 | ©Copyright 2024 Zilliz 28 Conversation Memory
  • 29.
    29 | ©Copyright 2024 Zilliz 29 Tool Use
  • 30.
    30 | ©Copyright 2024 Zilliz 30 30 | © Copyright 8/16/23 Zilliz 30 | © Copyright 8/16/23 Zilliz Local and Open RAG in action with Milvus Lite
  • 31.
    31 | ©Copyright 2024 Zilliz 31 General Ideas
  • 32.
    32 | ©Copyright 2024 Zilliz 32 Why RAG? RAG vs. LLM - Knowledge of LLM is out-of-date - LLM can not get your private knowledge - Help reduce Hallucinations - Transparency and interpretability RAG vs. Fine-tune - Fine-tune is expensive - Fine-tune spent much time - RAG is pluggable
  • 33.
    33 | ©Copyright 2024 Zilliz 33 Ollama • Run LLMs Locally • Embeddings Models DEMO!
  • 34.
    34 | ©Copyright 2024 Zilliz 34 Text Query computer with this image theme
  • 35.
    35 | ©Copyright 2024 Zilliz 35
  • 36.
    36 | ©Copyright 2024 Zilliz 36
  • 37.
    37 | ©Copyright 2024 Zilliz 37 | © Copyright 8/16/23 Zilliz 37 Advanced RAG
  • 38.
    38 | ©Copyright 2024 Zilliz 38 https://bit.ly/4eFdMlK https://bit.ly/3BLeLCx
  • 39.
    39 | ©Copyright 2024 Zilliz 39 https://bit.ly/3zXW8dX https://bit.ly/3NuK5ru
  • 40.
    40 | ©Copyright 2024 Zilliz 40 https://bit.ly/4gZ4Lpn Metadata Filtering Hybrid Search Agents https://bitz.ly/3UbqUqx
  • 41.
    41 | ©Copyright 2024 Zilliz 41 https://bit.ly/3YpKd1K Smart Chunking Embedding Model Choice Improving Information Retrieval and RAG with Hypothetical Document Embeddings HyDE https://bit.ly/4f8Ckne
  • 42.
    42 | ©Copyright 2024 Zilliz 42 Best Practices in RAG Apps https://zilliz.com/blog/best-practice-in-implementing-rag-apps
  • 43.
    43 | ©Copyright 2024 Zilliz 43 | © Copyright 8/16/23 Zilliz 43 Deeper Dive
  • 44.
    44 | ©Copyright 2024 Zilliz 44 01 RAG Overview
  • 45.
    45 | ©Copyright 2024 Zilliz 45 Basic Idea You want to use your data with a large language model
  • 46.
    46 | ©Copyright 2024 Zilliz 46 RAG vs Fine Tuning RAG Inject your data via a vector database like Milvus/Zilliz Query LLM Milvus Your Data Primary Use Case - Factual Recall - Forced Data Injection - Cost Optimization
  • 47.
    47 | ©Copyright 2024 Zilliz 47 RAG vs Fine Tuning LLM Fine Tuning Augment an LLM by training it on your data Your Data “New” LLM Query Primary Use Case - Style transfer - Domain specific usage
  • 48.
    48 | ©Copyright 2024 Zilliz 48 Takeaway Use RAG to force the LLM to work with your data by injecting it via a vector database like Milvus or Zilliz
  • 49.
    49 | ©Copyright 2024 Zilliz 49 02 Chunking
  • 50.
    50 | ©Copyright 2024 Zilliz 50 Chunking Considerations Chunk Size Chunk Overlap Character Splitters
  • 51.
    51 | ©Copyright 2024 Zilliz 51 How Does Your Data Look? Conversation Data Documentation Data Lecture or Q/A Data
  • 52.
    52 | ©Copyright 2024 Zilliz 52 Examples
  • 53.
    53 | ©Copyright 2024 Zilliz 53 Examples
  • 54.
    54 | ©Copyright 2024 Zilliz 54 Examples
  • 55.
    55 | ©Copyright 2024 Zilliz 55 Examples
  • 56.
    56 | ©Copyright 2024 Zilliz 56 Examples
  • 57.
    57 | ©Copyright 2024 Zilliz 57 Examples
  • 58.
    58 | ©Copyright 2024 Zilliz 58 Your chunking strategy depends on what your data looks like and what you need from it. Takeaway:
  • 59.
    59 | ©Copyright 2024 Zilliz 59 03 Embeddings
  • 60.
    60 | ©Copyright 2024 Zilliz 60 Examining Embeddings Picking a model What to embed Metadata
  • 61.
    61 | ©Copyright 2024 Zilliz 61 Embeddings Models
  • 62.
    62 | ©Copyright 2024 Zilliz 62 Embeddings Strategies Level 1 Embedding Chunks Directly Level 2 Embedding Sub and Super Chunks Level 3 Incorporating Chunking and Non-Chunking Metadata
  • 63.
    63 | ©Copyright 2024 Zilliz 63 Metadata Examples Chunking - Paragraph position - Section header - Larger paragraph - Sentence Number - … Non-Chunking - Author - Publisher - Organization - Role Based Access Control - …
  • 64.
    64 | ©Copyright 2024 Zilliz 64 Your embeddings strategy depends on your accuracy, cost, and use case needs Takeaway:
  • 65.
    65 | ©Copyright 2024 Zilliz 65 04 Vector Databases
  • 66.
    66 | ©Copyright 2024 Zilliz 66 Basic Idea Vector Databases provide the ability to inject your data via semantic similarity Considerations include: scale, performance, and flexibility
  • 67.
    67 | ©Copyright 2024 Zilliz 67 Semantic Similarity Image from Sutor et al Woman = [0.3, 0.4] Queen = [0.3, 0.9] King = [0.5, 0.7] Woman = [0.3, 0.4] Queen = [0.3, 0.9] King = [0.5, 0.7] Man = [0.5, 0.2] Queen - Woman + Man = King Queen = [0.3, 0.9] - Woman = [0.3, 0.4] [0.0, 0.5] + Man = [0.5, 0.2] King = [0.5, 0.7] Man = [0.5, 0.2]
  • 68.
    68 | ©Copyright 2024 Zilliz 68 Vector Databases Where do Vectors Come From?
  • 69.
    69 | ©Copyright 2024 Zilliz 69 Basic Functionality
  • 70.
    70 | ©Copyright 2024 Zilliz 70 Why Not Use a SQL/NoSQL Database? • Inefficiency in High-dimensional spaces • Suboptimal Indexing • Inadequate query support • Lack of scalability • Limited analytics capabilities • Data conversion issues TL;DR Vector operations are too computationally intensive for traditional database infrastructures
  • 71.
    71 | ©Copyright 2024 Zilliz 71 Why Not Use a Vector Search Library? • Have to manually implement filtering • Not optimized to take advantage of the latest hardware • Unable to handle large scale data • Lack of lifecycle management • Inefficient indexing capabilities • No built in safety mechanisms TL;DR Vector search libraries lack the infrastructure to help you scale, deploy, and manage your apps in production.
  • 72.
    72 | ©Copyright 2024 Zilliz 72 What is Milvus/Zilliz ideal for? • Advanced filtering • Hybrid search • Durability and backups • Replications/High Availability • Sharding • Aggregations • Lifecycle management • Multi-tenancy • High query load • High insertion/deletion • Full precision/recall • Accelerator support GPU, FPGA • Billion-scale storage Purpose-built to store, index and query vector embeddings from unstructured data at scale.
  • 73.
    73 | ©Copyright 2024 Zilliz 73 Meta Storage Root Query Data Index Coordinator Service Proxy Proxy etcd Log Broker SDK Load Balancer DDL/DCL DML NOTIFICATION CONTROL SIGNAL Object Storage Minio / S3 / AzureBlob Log Snapshot Delta File Index File Worker Node QUERY DATA DATA Message Storage Access Layer Query Node Data Node Index Node High-level overview of Milvusʼ Architecture
  • 74.
    74 | ©Copyright 2024 Zilliz 74 Milvus Architecture: Differentiation 1. Cloud Native, Distributed System Architecture 2. True Separation of Concerns 3. Scalable Index Creation Strategy with 512 MB Segments
  • 75.
    75 | ©Copyright Zilliz 75 W A New Data and Compute World
  • 76.
    76 | ©Copyright 2024 Zilliz 76 Data Source: The Digitization of the World by IDC 20% Other of newly generated data in 2025 will be unstructured data 90% The world is much more than just text and keywords
  • 77.
    77 | ©Copyright 2024 Zilliz 77 What do these companies that navigated the "trough of disillusionmentˮ have in common? Data Volumes. AI Hype?
  • 78.
    78 | ©Copyright 2024 Zilliz 78 Well-connected in LLM infrastructure to enable RAG use cases Framework Hardware Infrastructure Embedding Models LLMs Software Infrastructure Vector Database
  • 79.
    79 | ©Copyright 2024 Zilliz 79 New Hotness https://zilliz.com/learn/top-10-best-multimodal-ai-models-you-should-know https://github.com/facebookresearch/ImageBind
  • 80.
    80 | ©Copyright 2024 Zilliz 80 Vector vs Relational https://zilliz.com/blog/relational-databases-vs-vector-databases
  • 81.
    81 | ©Copyright Zilliz 81 V Overview of Vector Databases
  • 82.
    A New toolemerged. The Vector Database
  • 83.
    Vn, 1 … … … 1 2 3 4 5 Transform into Vectors UnstructuredData Images User Generated Content Video Documents Audio Vector Embeddings Perform Approximate Nearest Neighbor Similarity Search Perform Query Get Results Store in Vector Database How Similarity Search Works
  • 84.
    8 4 Vector Database :making sense of unstructured data 2024
  • 85.
    85 | ©Copyright 2024 Zilliz 85 Vector Search + Indexing + Filtering = 🚀
  • 86.
    86 | ©Copyright 2024 Zilliz 86 Use Case: Drug Discovery Vectors: 12 Billion Reqʼts: High Recall Index: BIN_FLAT Use Case: Data Search Vectors: 2 Billion Reqʼts: 200 ms, Cost mgmt Index: DiskANN for cost savings Use Case: Image Search Vectors: 20 Billion Reqʼts: High Insertion, Cost Index: Disk Based Index Use Case: Recommender System Vectors: 20 Billion Reqʼts: 5,000 QPS Index: HNSW & CAGRA Industry leaders already use vector search in their apps
  • 87.
    87 | ©Copyright Zilliz 87 | © Copyright Zilliz 87 Fast & Cost effective 3X faster, 3X Cheaper Pluggable Vector Search Lib Tiered Storage Scalable & Reliable Cloud Native, K8s Native Scale from 1  10B Storage / compute disaggregation UNCOMPROMISING DATA SECURITY Enterprise Ready Platform Battle-Tested: Delivering Reliable Performance and Enterprise-Grade Security AI Powered Vector Native Rich functionality for AI Born for vector data processing Thatʼs why we build Milvus And itʼs open sourced under Apache license!
  • 88.
    88 | ©Copyright Zilliz 88
  • 89.
    89 | ©Copyright 2024 Zilliz 89 Sample of Milvus Users
  • 90.
    90 | ©Copyright 2024 Zilliz 90 Multi-modal Search multimodal-demo.milvus.io
  • 91.
    91 | ©Copyright Zilliz 91
  • 92.
    92 | ©Copyright 2024 Zilliz 92 Seamless integration with all popular AI toolkits
  • 93.
    93 | ©Copyright 2024 Zilliz 93 https://zilliz.com/learn/milvus-notebooks
  • 94.
    94 | ©Copyright Zilliz 94 | © Copyright Zilliz 94 RESOURCES
  • 95.
    95 | ©Copyright Zilliz 95 Vector Database Resources Give Milvus a Star! Chat with me on Discord! https://github.com/milvus-io/milvus
  • 96.
    96 Unstructured Data Meetup https://www.meetup.com/unstructured-data-meetup-new-york/ Thismeetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs. This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
  • 97.
    97 | ©Copyright Zilliz 97 https://zilliz.com/learn/generative-ai
  • 98.
    98 | ©Copyright 2024 Zilliz 98 98 This week in Milvus, Towhee, Attu, GPT Cache, Gen AI, LLM, Apache NiFi, Apache Flink, Apache Kafka, ML, AI, Apache Spark, Apache Iceberg, Python, Java, Vector DB and Open Source friends. https://bit.ly/32dAJft https://github.com/milvus-io/milvus AIM Weekly by Tim Spann
  • 99.
    99 | ©Copyright 2024 Zilliz 99 milvus.io github.com/milvus-io/ @milvusio @paasDev /in/timothyspann Connect with me! Thank you!
  • 100.
    100 | ©Copyright Zilliz 100 Milvus 🤝 Open-Source MINIO Store Vectors and Indexes Enables Milvus’ stateless architecture Kafka/ Pulsar Handles Data Insertion stream Internal Component Communications Real-time updates to Milvus Prometheus / Grafana Collects metrics from Milvus Provides real-time monitoring dashboards Kubernetes Milvus Operator CRDs
  • 101.
    101 | ©Copyright Zilliz 101 Distributed Architecture
  • 102.
    102 | ©Copyright Zilliz 102 Dynamic Scaling 🚀 Stateless components for Easy Scaling Data sharding across multiple nodes Horizontal Pod Autoscaler (HPA) ● Query, Index, and Data Nodes can be scaled independently ● Allows for optimized resource allocation based on workload characteristics ● Distributes large datasets across multiple Data Nodes ● Enables parallel processing for improved query performance ● Automatically scales up and down ● Custom metrics can be used (e.g., query latency, throughput)
  • 103.
    103 | ©Copyright Zilliz 103 Stateless Architecture Stateless Components All Milvus components are deployed Stateless. Object Storage Milvus relies on Object Storage (MinIO, S3, etc) for data persistence. Vectors are stored in Object Storage, Metadata is in etcd. Scaling and Failover Scaling and failover don't involve traditional data rebalancing. When new pods are added or existing ones fail, they can immediately start handling requests by accessing data from the shared object storage.
  • 104.
    104 | ©Copyright Zilliz 104 Different Consistency levels Trade Offs ● Strong: Guaranteed up-to-date reads, highest latency ● Bounded: Reads may be slightly stale, but within a time bound ● Session: Consistent reads within a session, may be stale across sessions ● Eventually: Lowest latency, reads may be stale Ensures every node or replica has the same view of data at a given time. ● Strong consistency for critical applications requiring accurate results ● Eventually consistency for high-throughput, latency-sensitive apps
  • 105.
    105 | ©Copyright Zilliz 105 Growing Segment: • In-memory segment replaying data from the Log Broker. • Uses a FLAT index to ensure data is fresh and appendable. Sealed Segment: • Immutable segment using alternative indexing methods for efficiency. Milvus Data Layout - Segments
  • 106.
    106 | ©Copyright Zilliz 106 Index Building To avoid frequent index building for data updates. A collection in Milvus is divided further into segments, each with its own index.
  • 107.
    107 | ©Copyright Zilliz 107 Picking an Index ● 100% Recall – Use FLAT search if you need 100% accuracy ● 10MB < index_size < 2GB  Standard IVF ● 2GB < index_size < 20GB  Consider PQ and HNSW ● 20GB < index_size < 200GB  Composite Index, IVF_PQ or HNSW_SQ ● Disk-based indexes
  • 108.
    10 8 2024 Indexes Most of thevector index types supported by Milvus use approximate nearest neighbors search ANNS, ● HNSW: HNSW is a graph-based index and is best suited for scenarios that have a high demand for search efficiency. There is also a GPU version GPU_CAGRA, thanks to Nvidiaʼs contribution. ● FLAT: FLAT is best suited for scenarios that seek perfectly accurate and exact search results on a small, million-scale dataset. There is also a GPU version GPU_BRUTE_FORCE. ● IVF_FLAT: IVF_FLAT is a quantization-based index and is best suited for scenarios that seek an ideal balance between accuracy and query speed. There is also a GPU version GPU_IVF_FLAT. ● IVF_SQ8: IVF_SQ8 is a quantization-based index and is best suited for scenarios that seek a significant reduction on disk, CPU, and GPU memory consumption as these resources are very limited. ● IVF_PQ: IVF_PQ is a quantization-based index and is best suited for scenarios that seek high query speed even at the cost of accuracy. There is also a GPU version GPU_IVF_PQ.
  • 109.
    10 9 2024 Indexes Continued. ● SCANN:SCANN is similar to IVF_PQ in terms of vector clustering and product quantization. What makes them different lies in the implementation details of product quantization and the use of SIMD Single-Instruction / Multi-data) for efficient calculation. ● DiskANN: Based on Vamana graphs, DiskANN powers efficient searches within large datasets.
  • 110.
  • 111.
    New Stuff https://github.com/milvus-io/milvus-sdk-java/releases/tag/v2.4.4 Milvus 2.4introduces several new features and improvements: 1. New GPU Index - CAGRA: This GPU-based index offers significant performance improvements, especially for batch searches 2. Multi-vector and Hybrid Search: This feature allows storing vector embeddings from multiple models and conducting hybrid searches. 3. Sparse Vectors Support (Beta): Milvus now supports sparse vectors for processing in collections, which is particularly useful for keyword interpretation and analysis 4. Grouping Search: This feature enhances document-level recall for Retrieval-Augmented Generation (RAG) applications by providing categorical aggregation 5. Inverted Index and Fuzzy Matching: These capabilities improve keyword retrieval for scalar fields 6. Float16 and BF16 Vector Data Type Support: Milvus now supports these half-precision data types for vector fields, which can improve query efficiency and reduce memory usage. 7. L0 Segment: This new segment is designed to record deleted data, enhancing the performance of delete and upsert operations. 8. Refactored BulkInsert: The bulk-insert logic has been improved, allowing for importing multiple files in a single bulk-insert request.
  • 112.
    112 | ©Copyright Zilliz 112 Hybrid Search
  • 113.
    113 | ©Copyright 2024 Zilliz 113 | © Copyright 9/25/23 Zilliz 113 Inverted File FLAT IVFFLAT
  • 114.
    114 | ©Copyright 2024 Zilliz 114 IVFFLAT Index
  • 115.
    115 | ©Copyright 2024 Zilliz 115 IVFFLAT Index
  • 116.
    116 | ©Copyright 2024 Zilliz 116 IVFFLAT Index
  • 117.
    117 | ©Copyright 2024 Zilliz 117 | © Copyright 9/25/23 Zilliz 117 Hierarchical Navigable Small World HNSW
  • 118.
    118 | ©Copyright 2024 Zilliz 118 HNSW  Skip List
  • 119.
    119 | ©Copyright 2024 Zilliz 119 • Built by randomly shuffling data points and inserting them one by one, with each point connected to a predefined number of edges M. ⇒ Creates a graph structure that exhibits the "small world". ⇒ Any two points are connected through a relatively short path. HNSW  NSW Graph
  • 120.
    120 | ©Copyright 2024 Zilliz 120 HNSW
  • 121.
    121 | ©Copyright 2024 Zilliz 121 | © Copyright 8/16/23 Zilliz 121 Filtering
  • 122.
    122 | ©Copyright 2024 Zilliz 122 Filtering on Metadata ● Search Space Reduction w/ Pre-Filtering ● Bitset Wizardry 🧙 ○ Use Compact Bitsets to represent Filter Matches ○ Low-level CPU operations for speed ● Scalar Indexing ○ Bloom Filter ○ Hash ○ Tree-based
  • 123.
    123 | ©Copyright 2024 Zilliz 123 123 | © Copyright 8/16/23 Zilliz RAG is necessary but not sufficient
  • 124.
    124 | ©Copyright 2024 Zilliz 124 Good dishes come from good ingredients • Data collection • Data cleaning • Parsing & Chunking
  • 125.
    125 | ©Copyright 2024 Zilliz 125 125 | © Copyright 9/25/23 Zilliz 125 | © Copyright 9/25/23 Zilliz Simplify and streamline the conversion of unstructured data into state-of-the-art vector embeddings, using intuitive UI and Restful APIs. Pipelines Easy. High-quality. Scalable. Simplify the workflow for developers, from converting unstructured data into searchable vectors to retrieving them from vector databases Deliver excellence in every phase of vector search pipeline development and deployment, regardless of their expertise Ensure scalability for managing large datasets and high-throughput queries, maintaining high performance with min. customization or infra changes Zilliz Cloud Pipelines
  • 126.
    126 | ©Copyright 2024 Zilliz 126 Naive RAG Pipeline ⚠ Single-shot ⚠ No query understanding/planning ⚠ No tool use ⚠ No reflection, error correction ⚠ No memory (stateless)
  • 127.
    127 | ©Copyright 2024 Zilliz 127 First thing first Measure it before you attempts to improve it!
  • 128.
    128 | ©Copyright 2024 Zilliz 128 128 | © Copyright 8/16/23 Zilliz 128 | © Copyright 8/16/23 Zilliz 07 Advanced RAG techniques
  • 129.
    129 | ©Copyright 2024 Zilliz 129 Types of RAG Enhancement Techniques ● Divide & Conquer ○ Query Enhancement: better express or process the query intent. ○ Indexing Enhancement: data cleanup, better parser and chunking ○ Retriever Enhancement: more retrievers and hybrid search strategy ● Thinking outside the box ○ Agents? Other tools than retriever?
  • 130.
    130 | ©Copyright 2024 Zilliz 130 130 | © Copyright 8/16/23 Zilliz 130 | © Copyright 8/16/23 Zilliz Query Enhancement a)
  • 131.
    131 | ©Copyright 2024 Zilliz 131
  • 132.
    132 | ©Copyright 2024 Zilliz 132 132 | © Copyright 8/16/23 Zilliz 132 | © Copyright 8/16/23 Zilliz Indexing Enhancement b)
  • 133.
    133 | ©Copyright 2024 Zilliz 133
  • 134.
    134 | ©Copyright 2024 Zilliz 134 134 | © Copyright 8/16/23 Zilliz 134 | © Copyright 8/16/23 Zilliz Retriever Enhancement c)
  • 135.
    135 | ©Copyright 2024 Zilliz 135
  • 136.
    136 | ©Copyright 2024 Zilliz 136
  • 137.
    137 | ©Copyright 2024 Zilliz 137 137 | © Copyright 8/16/23 Zilliz 137 | © Copyright 8/16/23 Zilliz 08 Agentic RAG
  • 138.
    138 | ©Copyright 2024 Zilliz 138 Agentic RAG ✅ Multi-turn ✅ Query / task planning layer ✅ Tool interface for external environment ✅ Reflection ✅ Memory for personalization
  • 139.
    139 | ©Copyright 2024 Zilliz 139
  • 140.
    140 | ©Copyright 2024 Zilliz 140
  • 141.
    141 | ©Copyright 2024 Zilliz 141
  • 142.
    142 | ©Copyright 2024 Zilliz 142 Conversation Memory
  • 143.
    143 | ©Copyright 2024 Zilliz 143 Tool Use
  • 144.
    144 | ©Copyright 2024 Zilliz 144 Fivetran Source Connector Simplify unstructured data retrievals from 500+ system through Fivetran 500+ data sources Connectors & Transformation Logics Milvus source connector Fivetran SDK Unstructured Data Embedding Services Vectorization Vector Ingestion Semantic Search
  • 145.
    145 | ©Copyright 2024 Zilliz 145
  • 146.
    146 | ©Copyright 2024 Zilliz 146 https://milvus.io/docs/integrations_overview.md
  • 147.
    147 | ©Copyright 2024 Zilliz 147 https://zilliz.com/learn/milvus-notebooks
  • 148.
    148 | ©Copyright 2024 Zilliz 148 148 | © Copyright 10/22/23 Zilliz 148 | © Copyright 2024 Zilliz Milvus Open Source Self-Managed Zilliz Cloud SaaS Fully-Managed github.com/milvus-io/milvus Getting Started with Vector Databases zilliz.com/cloud