1 | © Copyright 2024 Zilliz
1
Multimodal RAG with Milvus
Yi Wang @ Zilliz
2 | © Copyright 2024 Zilliz
2
01 RAG is the New Search
CONTENTS
02 Multimodal Retrieval with Milvus
3 | © Copyright 2024 Zilliz
3
RAG is the New Search
4 | © Copyright 2024 Zilliz
4
Retrieval-Augmented Generation
5 | © Copyright 2024 Zilliz
5
A Typical Search System
Picture Credit: https://web.eecs.umich.edu/~nham/EECS398F19/
6 | © Copyright 2024 Zilliz
6
Indexing
Query Retrieval Prompt&
Generation
Recap of RAG Architecture
7 | © Copyright 2024 Zilliz
7
Indexing
Query Retrieval Prompt&
Generation
Recap of RAG Architecture
Offline Indexing
8 | © Copyright 2024 Zilliz
8
Indexing
Query Retrieval Prompt&
Generation
Recap of RAG Architecture
Online Serving
9 | © Copyright 2024 Zilliz
9
How RAG Resembles Search
10 | © Copyright 2024 Zilliz
10
Multimodal Retrieval with Milvus
11 | © Copyright 2024 Zilliz
11
Multi-modal Retrieval
● Combining text and
image in the search
query
● Retrieving
multi-modal content
for generation
Query = "feuilles brunes pendant la journée"
(i.e. "brown leaves during daytime")
12 | © Copyright 2024 Zilliz
12
13 | © Copyright 2024 Zilliz
13
Easy to start with, can even run on edge devices!
14 | © Copyright 2024 Zilliz
14
Scale-up on Docker
15 | © Copyright 2024 Zilliz
15
Up to 100 billion vectors with K8s!
16 | © Copyright 2024 Zilliz
16
17 | © Copyright 2024 Zilliz
17
Data Preparation
󰗓 Download images.zip file directly from:
https://huggingface.co/datasets/unum-cloud/ann-unsplash-25k/tree/main
import glob, time, pprint
import numpy as np
from PIL import Image
import pandas as pd
# Load image files and descriptions
image_data = pd.read_csv('images.csv')
print(image_data.shape)
display(image_data.head(2))
# List of image urls and texts.
image_urls = list(image_data.photo_id)
image_texts = list(image_data.ai_description)
18 | © Copyright 2024 Zilliz
18
Create a Milvus Collection
# STEP 1. Connect to milvus
connection = connections.connect(
alias="default",
host='localhost', # or '0.0.0.0' or 'localhost'
port='19530'
)
# STEP 2. Create a new collection and build index
EMBEDDING_DIM = 256
MAX_LENGTH = 65535
# Step 2.1 Define the data schema for the new Collection.
fields = [
# Use auto generated id as primary key
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True, max_length=100),
FieldSchema(name="text_vector", dtype=DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM),
FieldSchema(name="image_vector", dtype=DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM),
FieldSchema(name="chunk", dtype=DataType.VARCHAR, max_length=MAX_LENGTH),
FieldSchema(name="image_filepath", dtype=DataType.VARCHAR, max_length=MAX_LENGTH),
]
schema = CollectionSchema(fields, "")
# Step 2.2 create collection
col = Collection(“Demo_multimodal”, schema)
# Step 2.3 Build index for both vector columns .
image_index = {"metric_type": "COSINE"}
col.create_index("image_vector", image_index)
text_index = {"metric_type": "COSINE"}
col.create_index("text_vector", text_index)
col.load()
19 | © Copyright 2024 Zilliz
19
Data Vectorization & insertion
# STEP 4. Insert data into milvus OR zilliz.
# Prepare data batch.
chunk_dict_list = []
for chunk, img_url, img_embed, text_embed in zip(
batch_texts,
batch_urls,
image_embeddings, text_embeddings):
# Assemble embedding vector, original text chunk, metadata.
chunk_dict = {
'chunk': chunk,
'image_filepath': img_url,
'text_vector': text_embed,
'image_vector': img_embed
}
chunk_dict_list.append(chunk_dict)
# Actually insert data batch.
# If the data size is large, try bulk_insert()
col.insert(data=chunk_dict_list)
# STEP 3. Data vectorization(i.e. embedding).
image_embeddings, text_embeddings = embedding_model(
batch_images=batch_images,
batch_texts=batch_texts)
20 | © Copyright 2024 Zilliz
20
# STEP 4. hybrid_search() is the API for multimodal search
results = col.hybrid_search(
reqs=[image_req, text_req],
rerank=RRFRanker(),
limit=top_k,
output_fields=output_fields)
Final step: Search
21 | © Copyright 2024 Zilliz
21
[Multimodal] search with text-only query
Query = "feuilles brunes pendant la journée"
(i.e. "brown leaves during daytime")
22 | © Copyright 2024 Zilliz
22
[Multimodal] search with image-only query
[Query is an image]
23 | © Copyright 2024 Zilliz
23
[Multimodal] search with text + image query
Query = text + image
1. "silhouette d'une personne assise sur une
roche au couche du soleil"
(i.e. "silhouette of person sitting on rock formation
during golden hour")
2. Image below
Result
24 | © Copyright 2024 Zilliz
24
QA
25 | © Copyright 2024 Zilliz
25 | © Copyright 9/25/23 Zilliz
25
26 | © Copyright 2024 Zilliz
26 | © Copyright 9/25/23 Zilliz
26
curl --request POST 
--url “${MILVUS_HOST}:${MILVUS_PORT}/v2/vectordb/entities/advanced_search” 
--header “Authorization: Bearer ${TOKEN}” 
--header “accept: application/json” 
--header “content-type: application/json” 
-d
{
"collectionName": "book",
"search": {
"search_by": {
"field": "book_intro_vector",
"data": [1, 2, ...],
},
"search_by": {
"field": "book_cover_vector",
"data": [2, 3, ...],
},
},
"rerank": {
"strategy": "rrf",
},
"limit": 10,
}
Retrieve Params
Re-rank Params

Multimodal Retrieval Augmented Generation (RAG) with Milvus

  • 1.
    1 | ©Copyright 2024 Zilliz 1 Multimodal RAG with Milvus Yi Wang @ Zilliz
  • 2.
    2 | ©Copyright 2024 Zilliz 2 01 RAG is the New Search CONTENTS 02 Multimodal Retrieval with Milvus
  • 3.
    3 | ©Copyright 2024 Zilliz 3 RAG is the New Search
  • 4.
    4 | ©Copyright 2024 Zilliz 4 Retrieval-Augmented Generation
  • 5.
    5 | ©Copyright 2024 Zilliz 5 A Typical Search System Picture Credit: https://web.eecs.umich.edu/~nham/EECS398F19/
  • 6.
    6 | ©Copyright 2024 Zilliz 6 Indexing Query Retrieval Prompt& Generation Recap of RAG Architecture
  • 7.
    7 | ©Copyright 2024 Zilliz 7 Indexing Query Retrieval Prompt& Generation Recap of RAG Architecture Offline Indexing
  • 8.
    8 | ©Copyright 2024 Zilliz 8 Indexing Query Retrieval Prompt& Generation Recap of RAG Architecture Online Serving
  • 9.
    9 | ©Copyright 2024 Zilliz 9 How RAG Resembles Search
  • 10.
    10 | ©Copyright 2024 Zilliz 10 Multimodal Retrieval with Milvus
  • 11.
    11 | ©Copyright 2024 Zilliz 11 Multi-modal Retrieval ● Combining text and image in the search query ● Retrieving multi-modal content for generation Query = "feuilles brunes pendant la journée" (i.e. "brown leaves during daytime")
  • 12.
    12 | ©Copyright 2024 Zilliz 12
  • 13.
    13 | ©Copyright 2024 Zilliz 13 Easy to start with, can even run on edge devices!
  • 14.
    14 | ©Copyright 2024 Zilliz 14 Scale-up on Docker
  • 15.
    15 | ©Copyright 2024 Zilliz 15 Up to 100 billion vectors with K8s!
  • 16.
    16 | ©Copyright 2024 Zilliz 16
  • 17.
    17 | ©Copyright 2024 Zilliz 17 Data Preparation 󰗓 Download images.zip file directly from: https://huggingface.co/datasets/unum-cloud/ann-unsplash-25k/tree/main import glob, time, pprint import numpy as np from PIL import Image import pandas as pd # Load image files and descriptions image_data = pd.read_csv('images.csv') print(image_data.shape) display(image_data.head(2)) # List of image urls and texts. image_urls = list(image_data.photo_id) image_texts = list(image_data.ai_description)
  • 18.
    18 | ©Copyright 2024 Zilliz 18 Create a Milvus Collection # STEP 1. Connect to milvus connection = connections.connect( alias="default", host='localhost', # or '0.0.0.0' or 'localhost' port='19530' ) # STEP 2. Create a new collection and build index EMBEDDING_DIM = 256 MAX_LENGTH = 65535 # Step 2.1 Define the data schema for the new Collection. fields = [ # Use auto generated id as primary key FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True, max_length=100), FieldSchema(name="text_vector", dtype=DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM), FieldSchema(name="image_vector", dtype=DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM), FieldSchema(name="chunk", dtype=DataType.VARCHAR, max_length=MAX_LENGTH), FieldSchema(name="image_filepath", dtype=DataType.VARCHAR, max_length=MAX_LENGTH), ] schema = CollectionSchema(fields, "") # Step 2.2 create collection col = Collection(“Demo_multimodal”, schema) # Step 2.3 Build index for both vector columns . image_index = {"metric_type": "COSINE"} col.create_index("image_vector", image_index) text_index = {"metric_type": "COSINE"} col.create_index("text_vector", text_index) col.load()
  • 19.
    19 | ©Copyright 2024 Zilliz 19 Data Vectorization & insertion # STEP 4. Insert data into milvus OR zilliz. # Prepare data batch. chunk_dict_list = [] for chunk, img_url, img_embed, text_embed in zip( batch_texts, batch_urls, image_embeddings, text_embeddings): # Assemble embedding vector, original text chunk, metadata. chunk_dict = { 'chunk': chunk, 'image_filepath': img_url, 'text_vector': text_embed, 'image_vector': img_embed } chunk_dict_list.append(chunk_dict) # Actually insert data batch. # If the data size is large, try bulk_insert() col.insert(data=chunk_dict_list) # STEP 3. Data vectorization(i.e. embedding). image_embeddings, text_embeddings = embedding_model( batch_images=batch_images, batch_texts=batch_texts)
  • 20.
    20 | ©Copyright 2024 Zilliz 20 # STEP 4. hybrid_search() is the API for multimodal search results = col.hybrid_search( reqs=[image_req, text_req], rerank=RRFRanker(), limit=top_k, output_fields=output_fields) Final step: Search
  • 21.
    21 | ©Copyright 2024 Zilliz 21 [Multimodal] search with text-only query Query = "feuilles brunes pendant la journée" (i.e. "brown leaves during daytime")
  • 22.
    22 | ©Copyright 2024 Zilliz 22 [Multimodal] search with image-only query [Query is an image]
  • 23.
    23 | ©Copyright 2024 Zilliz 23 [Multimodal] search with text + image query Query = text + image 1. "silhouette d'une personne assise sur une roche au couche du soleil" (i.e. "silhouette of person sitting on rock formation during golden hour") 2. Image below Result
  • 24.
    24 | ©Copyright 2024 Zilliz 24 QA
  • 25.
    25 | ©Copyright 2024 Zilliz 25 | © Copyright 9/25/23 Zilliz 25
  • 26.
    26 | ©Copyright 2024 Zilliz 26 | © Copyright 9/25/23 Zilliz 26 curl --request POST --url “${MILVUS_HOST}:${MILVUS_PORT}/v2/vectordb/entities/advanced_search” --header “Authorization: Bearer ${TOKEN}” --header “accept: application/json” --header “content-type: application/json” -d { "collectionName": "book", "search": { "search_by": { "field": "book_intro_vector", "data": [1, 2, ...], }, "search_by": { "field": "book_cover_vector", "data": [2, 3, ...], }, }, "rerank": { "strategy": "rrf", }, "limit": 10, } Retrieve Params Re-rank Params