2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Techniques

1 | © Copyright 2024 Zilliz
1
Advanced Retrieval Augmented
Generation RAG Techniques
Tim Spann @ Zilliz

2
Slides
X

3
3 | © Copyright 10/22/23 Zilliz
Tim Spann
Principal Developer
Advocate, Zilliz
tim.spann@zilliz.com
https://www.linkedin.com/in/timothyspann/
https://x.com/PaaSDev

4 | © Copyright Zilliz
4
M A Quick Introduction to Milvus

5
Mission:
Helping organizations make sense
of unstructured data.
2017
Founded
$113M
Raised
140
Employees
Redwood City, CA
Headquarters

6
Milvus is an Open-Source Vector Database to
store, index, manage, and use the massive
number of embedding vectors generated by
deep neural networks and LLMs.
contributors
400
stars
29K
docker pulls
66M
forks
2.7K
+
Milvus: The most widely-adopted vector database

7
2024
Milvus Lite Milvus Standalone Milvus Distributed
● Ideal for prototyping,
small scale
experiments.
● Easy to set up and
use, pip instally
pymilvus
● Scale to ≈1M vectors
● Run on K8s
● Load balancer and
Multi-Node
Management
● Scaling of each
component
independently
● Scale to 100B
vectors
● Single-Node
Deployment
● Bundled in a single
Docker Image
● Supports Primary/
Secondary
● Scale up to 100M
vectors
Ready to scale 🚀
Write your code once, and run it everywhere, at scale!
● API and SDK are the same

9
https://zilliz.com/resources/analyst-report/zilliz-forrester-wave-vector-database-report
Zilliz Named a Leader in Vector Database Providers
Looking for the right Vector Database solution for your AI applications?
Choosing a vector database that allows you to efficiently manage and search
large-scale vector data for AI applications can be challenging.
Forrester assessed the most significant vector database providers in the market. Zilliz
stood out for its:
● Cloud-native scalability for handling massive vector datasets
● Lightning-fast search capabilities for real-time AI applications
● Robust open-source foundation with Milvus
● Exceptional technical support and reliability
Forrester notes that Zilliz "is at the forefront of innovation, delivering exceptional speed
and efficiency in vector processing and search to support real-time AI applications."
Find out why Forrester named Zilliz a Leader and how we can accelerate your AI
initiatives.

10
RAG Enhancements
RAG

11
Types of RAG Enhancement Techniques
● Divide & Conquer
○ Query Enhancement: better express or process the query intent.
○ Indexing Enhancement: data cleanup, better parser and chunking
○ Retriever Enhancement: more retrievers and hybrid search strategy
● Multimodal RAG
● Agentic RAG
● Thinking outside the box
○ Agents? Other tools than retriever?

12
https://medium.com/@zilliz_learn/multimodal-rag-expanding-beyond-text-for-smar
ter-ai-d3e30e2eaef0
https://zilliz.com/learn/top-10-best-multimodal-ai-models-you-should-know
https://huggingface.co/google/deplot
Vision Language Models (VLMs)

13
RIG
https://pub.towardsai.net/retrieval-interleaved-generation-rig-when-real-ti
me-data-retrieval-meets-response-generation-a33e44ddbd74

14
Query Enhancement
a)

15

16
Indexing Enhancement
b)

17

18
Retriever Enhancement
c)

19

20

21
Multimodal RAG
X

22

23
Agentic RAG

24
Agentic RAG
✅ Multi-turn
✅ Query / task planning layer
✅ Tool interface for external environment
✅ Reflection
✅ Memory for personalization

25

26

27

28
Conversation Memory

29
Tool Use

30
Local and Open
RAG in action with Milvus
Lite

31
General Ideas

32
Why RAG?
RAG vs. LLM
- Knowledge of LLM is out-of-date
- LLM can not get your private knowledge
- Help reduce Hallucinations
- Transparency and interpretability
RAG vs. Fine-tune
- Fine-tune is expensive
- Fine-tune spent much time
- RAG is pluggable

33
Ollama
• Run LLMs Locally
• Embeddings Models
DEMO!

34
Text Query
computer with this image
theme

35

36

37
Advanced RAG

38
https://bit.ly/4eFdMlK https://bit.ly/3BLeLCx

39
https://bit.ly/3zXW8dX https://bit.ly/3NuK5ru

40
https://bit.ly/4gZ4Lpn
Metadata Filtering
Hybrid Search
Agents
https://bitz.ly/3UbqUqx

41
https://bit.ly/3YpKd1K
Smart Chunking
Embedding
Model Choice
Improving Information
Retrieval and RAG with
Hypothetical Document
Embeddings HyDE
https://bit.ly/4f8Ckne

42
Best Practices in RAG Apps
https://zilliz.com/blog/best-practice-in-implementing-rag-apps

43
Deeper Dive

44
01 RAG Overview

45
Basic Idea
You want to use your data with a large
language model

46
RAG vs Fine Tuning
RAG
Inject your data via a vector
database like Milvus/Zilliz
Query LLM
Milvus
Your Data
Primary Use Case
- Factual Recall
- Forced Data Injection
- Cost Optimization

47
RAG vs Fine Tuning
LLM
Fine Tuning
Augment an LLM by training it
on your data
Your Data
“New” LLM
Query
Primary Use Case
- Style transfer
- Domain specific usage

48
Takeaway
Use RAG to force the LLM to work with your data by injecting it
via a vector database like Milvus or Zilliz

49
02 Chunking

50
Chunking Considerations
Chunk Size
Chunk Overlap
Character Splitters

51
How Does Your Data Look?
Conversation
Data
Documentation
Data
Lecture or Q/A
Data

52
Examples

53
Examples

54
Examples

55
Examples

56
Examples

57
Examples

58
Your chunking strategy depends on what your data looks
like and what you need from it.
Takeaway:

59
03 Embeddings

60
Examining Embeddings
Picking a model
What to embed
Metadata

61
Embeddings Models

62
Embeddings Strategies
Level 1 Embedding Chunks Directly
Level 2 Embedding Sub and Super Chunks
Level 3 Incorporating Chunking and Non-Chunking Metadata

63
Metadata Examples
Chunking
- Paragraph position
- Section header
- Larger paragraph
- Sentence Number
- …
Non-Chunking
- Author
- Publisher
- Organization
- Role Based Access Control
- …

64
Your embeddings strategy depends on your accuracy,
cost, and use case needs
Takeaway:

65
04 Vector Databases

66
Basic Idea
Vector Databases provide the ability to inject your data via
semantic similarity
Considerations include: scale, performance, and flexibility

67
Semantic Similarity
Image from Sutor et al
Woman = [0.3, 0.4]
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Woman = [0.3, 0.4]
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Man = [0.5, 0.2]
Queen - Woman + Man = King
Queen = [0.3, 0.9]
- Woman = [0.3, 0.4]
[0.0, 0.5]
+ Man = [0.5, 0.2]
King = [0.5, 0.7]
Man = [0.5, 0.2]

68
Vector
Databases
Where do Vectors Come From?

69
Basic Functionality

70
Why Not Use a SQL/NoSQL Database?
• Inefficiency in High-dimensional spaces
• Suboptimal Indexing
• Inadequate query support
• Lack of scalability
• Limited analytics capabilities
• Data conversion issues
TL;DR Vector operations are too computationally intensive for
traditional database infrastructures

71
Why Not Use a Vector Search Library?
• Have to manually implement filtering
• Not optimized to take advantage of the latest hardware
• Unable to handle large scale data
• Lack of lifecycle management
• Inefficient indexing capabilities
• No built in safety mechanisms
TL;DR Vector search libraries lack the infrastructure to help you scale,
deploy, and manage your apps in production.

72
What is Milvus/Zilliz ideal for?
• Advanced filtering
• Hybrid search
• Durability and backups
• Replications/High Availability
• Sharding
• Aggregations
• Lifecycle management
• Multi-tenancy
• High query load
• High insertion/deletion
• Full precision/recall
• Accelerator support GPU,
FPGA
• Billion-scale storage
Purpose-built to store, index and query vector embeddings from unstructured data at scale.

73
Meta Storage
Root Query Data Index
Coordinator Service
Proxy
Proxy
etcd
Log Broker
SDK
Load Balancer
DDL/DCL
DML
NOTIFICATION
CONTROL SIGNAL
Object Storage
Minio / S3 / AzureBlob
Log Snapshot Delta File Index File
Worker Node QUERY DATA DATA
Message Storage
Access Layer
Query Node Data Node Index Node
High-level overview of Milvusʼ Architecture

74
Milvus Architecture: Differentiation
1. Cloud Native, Distributed System Architecture
2. True Separation of Concerns
3. Scalable Index Creation Strategy with 512 MB Segments

75
W A New Data and Compute World

76 Data Source: The Digitization of the World by IDC
20%
Other
of newly generated data in
2025 will be unstructured data
90%
The world is much more than just text and keywords

77
What do these
companies that
navigated the "trough
of disillusionmentˮ
have in common?
Data Volumes.
AI Hype?

78
Well-connected in LLM infrastructure to enable RAG
use cases
Framework
Hardware
Infrastructure
Embedding Models LLMs
Software Infrastructure
Vector Database

79
New Hotness
https://zilliz.com/learn/top-10-best-multimodal-ai-models-you-should-know
https://github.com/facebookresearch/ImageBind

80
Vector vs Relational
https://zilliz.com/blog/relational-databases-vs-vector-databases

81
V Overview of Vector Databases

A New tool emerged. The Vector Database

Vn, 1
…
…
…
1
2
3
4
5
Transform into
Vectors
Unstructured Data
Images
User Generated
Content
Video
Documents
Audio
Vector Embeddings
Perform
Approximate
Nearest Neighbor
Similarity Search
Perform Query
Get Results
Store in Vector Database
How Similarity Search Works

8
4
Vector Database : making sense of unstructured data
2024

85
Vector Search
+ Indexing
+ Filtering
= 🚀

86
Use Case: Drug Discovery
Vectors: 12 Billion
Reqʼts: High Recall
Index: BIN_FLAT
Use Case: Data Search
Vectors: 2 Billion
Reqʼts: 200 ms, Cost mgmt
Index: DiskANN for cost savings
Use Case: Image Search
Vectors: 20 Billion
Reqʼts: High Insertion, Cost
Index: Disk Based Index
Use Case: Recommender System
Vectors: 20 Billion
Reqʼts: 5,000 QPS
Index: HNSW & CAGRA
Industry leaders already use vector search in their apps

87
Fast & Cost effective
3X faster, 3X
Cheaper
Pluggable Vector Search Lib
Tiered Storage
Scalable & Reliable
Cloud Native,
K8s Native
Scale from 1  10B
Storage / compute disaggregation
UNCOMPROMISING DATA
SECURITY
Enterprise Ready
Platform
Battle-Tested: Delivering Reliable
Performance and Enterprise-Grade
Security
AI Powered
Vector Native
Rich functionality for AI
Born for vector data processing
Thatʼs why we build Milvus And itʼs open sourced
under Apache license!

89
Sample of Milvus Users

90
Multi-modal Search
multimodal-demo.milvus.io

92
Seamless integration with all popular AI toolkits

93
https://zilliz.com/learn/milvus-notebooks

94
RESOURCES

95
Vector Database Resources
Give Milvus a Star! Chat with me on Discord!
https://github.com/milvus-io/milvus

96
Unstructured Data Meetup
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics
such as vector databases, LLMs, and managing data at scale. The intended audience of this group
includes roles like machine learning engineers, data scientists, data engineers, software engineers, and
PMs.
This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.

97
https://zilliz.com/learn/generative-ai

98
98
This week in Milvus, Towhee, Attu, GPT
Cache, Gen AI, LLM, Apache NiFi, Apache
Flink, Apache Kafka, ML, AI, Apache Spark,
Apache Iceberg, Python, Java, Vector DB
and Open Source friends.
https://bit.ly/32dAJft
https://github.com/milvus-io/milvus
AIM Weekly by Tim Spann

99
milvus.io
github.com/milvus-io/
@milvusio
@paasDev
/in/timothyspann
Connect with me!
Thank you!

100
Milvus 🤝 Open-Source
MINIO
Store Vectors and Indexes
Enables Milvus’ stateless
architecture
Kafka/ Pulsar
Handles Data Insertion
stream
Internal Component
Communications
Real-time updates to
Milvus
Prometheus /
Grafana
Collects metrics from
Milvus
Provides real-time
monitoring dashboards
Kubernetes
Milvus Operator CRDs

101
Distributed
Architecture

102
Dynamic Scaling 🚀
Stateless components
for Easy Scaling
Data sharding across
multiple nodes
Horizontal Pod
Autoscaler (HPA)
● Query, Index, and
Data Nodes can be
scaled
independently
● Allows for optimized
resource allocation
based on workload
characteristics
● Distributes large
datasets across
multiple Data Nodes
● Enables parallel
processing for
improved query
performance
● Automatically
scales up and down
● Custom metrics can
be used (e.g., query
latency, throughput)

103
Stateless Architecture
Stateless Components All Milvus components are deployed Stateless.
Object Storage
Milvus relies on Object Storage (MinIO, S3, etc) for data
persistence.
Vectors are stored in Object Storage, Metadata is in etcd.
Scaling and Failover
Scaling and failover don't involve traditional data rebalancing.
When new pods are added or existing ones fail, they can
immediately start handling requests by accessing data from the
shared object storage.

104
Different Consistency levels
Trade Oﬀs
● Strong: Guaranteed up-to-date
reads, highest latency
● Bounded: Reads may be slightly
stale, but within a time bound
● Session: Consistent reads within a
session, may be stale across
sessions
● Eventually: Lowest latency, reads
may be stale
Ensures every node or replica has the
same view of data at a given time.
● Strong consistency for critical
applications requiring accurate
results
● Eventually consistency for
high-throughput,
latency-sensitive apps

105
Growing Segment:
• In-memory segment replaying data
from the Log Broker.
• Uses a FLAT index to ensure data is
fresh and appendable.
Sealed Segment:
• Immutable segment using
alternative indexing methods for
efficiency.
Milvus Data Layout - Segments

106
Index Building
To avoid frequent index building
for data updates.
A collection in Milvus is divided
further into segments, each with
its own index.

107
Picking an Index
● 100% Recall – Use FLAT search if you need 100% accuracy
● 10MB < index_size < 2GB  Standard IVF
● 2GB < index_size < 20GB  Consider PQ and HNSW
● 20GB < index_size < 200GB  Composite Index, IVF_PQ or
HNSW_SQ
● Disk-based indexes

10
8
2024
Indexes
Most of the vector index types supported by Milvus use approximate nearest neighbors search ANNS,
● HNSW: HNSW is a graph-based index and is best suited for scenarios that have a high demand for
search efficiency. There is also a GPU version GPU_CAGRA, thanks to Nvidiaʼs contribution.
● FLAT: FLAT is best suited for scenarios that seek perfectly accurate and exact search results on a small,
million-scale dataset. There is also a GPU version GPU_BRUTE_FORCE.
● IVF_FLAT: IVF_FLAT is a quantization-based index and is best suited for scenarios that seek an ideal
balance between accuracy and query speed. There is also a GPU version GPU_IVF_FLAT.
● IVF_SQ8: IVF_SQ8 is a quantization-based index and is best suited for scenarios that seek a significant
reduction on disk, CPU, and GPU memory consumption as these resources are very limited.
● IVF_PQ: IVF_PQ is a quantization-based index and is best suited for scenarios that seek high query
speed even at the cost of accuracy. There is also a GPU version GPU_IVF_PQ.

10
9
2024
Indexes Continued.
● SCANN: SCANN is similar to IVF_PQ in terms of vector clustering and product quantization. What makes
them different lies in the implementation details of product quantization and the use of SIMD
Single-Instruction / Multi-data) for efficient calculation.
● DiskANN: Based on Vamana graphs, DiskANN powers efficient searches within large datasets.

New Stuff
https://github.com/milvus-io/milvus-sdk-java/releases/tag/v2.4.4
Milvus 2.4 introduces several new features and improvements:
1. New GPU Index - CAGRA: This GPU-based index offers significant performance improvements, especially for batch
searches
2. Multi-vector and Hybrid Search: This feature allows storing vector embeddings from multiple models and conducting
hybrid searches.
3. Sparse Vectors Support (Beta): Milvus now supports sparse vectors for processing in collections, which is
particularly useful for keyword interpretation and analysis
4. Grouping Search: This feature enhances document-level recall for Retrieval-Augmented Generation (RAG)
applications by providing categorical aggregation
5. Inverted Index and Fuzzy Matching: These capabilities improve keyword retrieval for scalar fields
6. Float16 and BF16 Vector Data Type Support: Milvus now supports these half-precision data types for vector fields,
which can improve query efficiency and reduce memory usage.
7. L0 Segment: This new segment is designed to record deleted data, enhancing the performance of delete and upsert
operations.
8. Refactored BulkInsert: The bulk-insert logic has been improved, allowing for importing multiple files in a single
bulk-insert request.

112
Hybrid Search

113
Inverted File FLAT
IVFFLAT

114
IVFFLAT Index

115
IVFFLAT Index

116
IVFFLAT Index

117
Hierarchical Navigable
Small World HNSW

118
HNSW  Skip List

119
• Built by randomly shuffling data points and inserting them one by
one, with each point connected to a predefined number of edges
M.
⇒ Creates a graph structure that exhibits the "small world".
⇒ Any two points are connected through a relatively short path.
HNSW  NSW Graph

120
HNSW

121
Filtering

122
Filtering on Metadata
● Search Space Reduction w/ Pre-Filtering
● Bitset Wizardry 🧙
○ Use Compact Bitsets to represent Filter Matches
○ Low-level CPU operations for speed
● Scalar Indexing
○ Bloom Filter
○ Hash
○ Tree-based

123
RAG is necessary but
not sufficient

124
Good dishes come from good ingredients
• Data collection
• Data cleaning
• Parsing & Chunking

125
Simplify and streamline
the conversion of
unstructured data into
state-of-the-art vector
embeddings, using
intuitive UI and Restful
APIs.
Pipelines
Easy. High-quality. Scalable.
Simplify the workflow
for developers, from
converting
unstructured data into
searchable vectors to
retrieving them from
vector databases
Deliver excellence in
every phase of vector
search pipeline
development and
deployment,
regardless of their
expertise
Ensure scalability for
managing large
datasets and
high-throughput
queries, maintaining
high performance with
min. customization or
infra changes
Zilliz Cloud Pipelines

126
Naive RAG Pipeline
⚠ Single-shot
⚠ No query understanding/planning
⚠ No tool use
⚠ No reflection, error correction
⚠ No memory (stateless)

127
First thing first
Measure it before you attempts to improve it!

128
07 Advanced RAG techniques

129
● Divide & Conquer
○ Query Enhancement: better express or process the query intent.
○ Indexing Enhancement: data cleanup, better parser and chunking
○ Retriever Enhancement: more retrievers and hybrid search strategy
● Thinking outside the box
○ Agents? Other tools than retriever?

130
Query Enhancement
a)

131

132
Indexing Enhancement
b)

133

134
Retriever Enhancement
c)

135

136

137
08 Agentic RAG

138
Agentic RAG
✅ Multi-turn
✅ Query / task planning layer
✅ Tool interface for external environment
✅ Reflection
✅ Memory for personalization

139

140

141

142
Conversation Memory

143
Tool Use

144
Fivetran Source Connector
Simplify unstructured data retrievals from 500+ system through Fivetran
500+ data sources
Connectors &
Transformation
Logics
Milvus source
connector Fivetran
SDK
Unstructured
Data
Embedding Services
Vectorization
Vector
Ingestion
Semantic
Search

145

146
https://milvus.io/docs/integrations_overview.md

147
https://zilliz.com/learn/milvus-notebooks

148
Milvus
Open Source Self-Managed
Zilliz Cloud
SaaS Fully-Managed
github.com/milvus-io/milvus
Getting Started with Vector Databases
zilliz.com/cloud

2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Techniques

More Related Content

What's hot

Similar to 2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Techniques

More from Timothy Spann

Recently uploaded

2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Techniques