1 | © Copyright 8/16/23 Zilliz
1 | © Copyright 8/16/23 Zilliz
Stephen Batifol | Zilliz
Unstructured Data Meetup - Berlin
MultiModal RAG using vLLM
and Pixtral
2 | © Copyright 8/16/23 Zilliz
2 | © Copyright 8/16/23 Zilliz
Stephen Batifol
Developer Advocate, Zilliz/ Milvus
stephen.batifol@zilliz.com
linkedin.com/in/stephen-batifol/
@stephenbtl
Speaker
3 | © Copyright 8/16/23 Zilliz
3 | © Copyright 8/16/23 Zilliz
30K
GitHub
Stars
25M
Downloads
250
Contributors
2,600
+
Forks
Milvus is an open-source vector database for GenAI projects. pip install on your
laptop, plug into popular AI dev tools, and push to production with a single line of
code.
Easy Setup
pip install
pymilvus to start
coding in a notebook
within seconds.
Reusable Code
Write once, and
deploy with one line
of code into the
production
environment
Integration
Plug into OpenAI,
Langchain,
LlamaIndex, and
many more
Feature-rich
Dense & sparse
embeddings,
Filtering, Reranking
and beyond
4 | © Copyright 8/16/23 Zilliz
4 | © Copyright 8/16/23 Zilliz
Well-connected in the AI infrastructure
Framework
Hardware
Infrastructure
Embedding Models LLMs
Software Infrastructure
Vector Database
5 | © Copyright 8/16/23 Zilliz
5 | © Copyright 8/16/23 Zilliz
● pip install on your laptop
● Plug into your favorite AI dev tools
● Push to production with a single line of code
Easy to start
6 | © Copyright 8/16/23 Zilliz
6 | © Copyright 8/16/23 Zilliz
2024
Milvus Lite Milvus Standalone Milvus Distributed
● Ideal for prototyping,
small scale
experiments.
● Easy to set up and
use, pip instally
pymilvus
● Scale to ≈1M vectors
● Run on K8s
● Load balancer and
Multi-Node
Management
● Scaling of each
component
independently
● Scale to 100B
vectors
● Single-Node
Deployment
● Bundled in a single
Docker Image
● Supports Primary/
Secondary
● Scale up to 100M
vectors
Ready to scale 🚀
Write your code once, and run it everywhere, at scale!
● API and SDK are the same
7 | © Copyright 8/16/23 Zilliz
7 | © Copyright 8/16/23 Zilliz
Retrieval Augmented
Generation RAG
Expand LLMs' knowledge by
incorporating external data sources
into LLMs and your AI applications.
Match user behavior or content
features with other similar ones to
make effective recommendations.
Recommender System
Search for semantically similar
texts across vast amounts of
natural language documents.
Text/ Semantic Search
Image Similarity Search
Identify and search for visually
similar images or objects from a
vast collection of image libraries.
Video Similarity Search
Search for similar videos, scenes,
or objects from extensive
collections of video libraries.
Audio Similarity Search
Find similar audios in large datasets
for tasks like genre classification or
speech recognition
Molecular Similarity Search
Search for similar substructures,
superstructures, and other
structures for a specific molecule.
Anomaly Detection
Detect data points, events, and
observations that deviate
significantly from the usual pattern
Multimodal Similarity Search
Search over multiple types of data
simultaneously, e.g. text and
images
Common AI Use Cases
8 | © Copyright 8/16/23 Zilliz
8 | © Copyright 8/16/23 Zilliz
8 | © Copyright 8/16/23 Zilliz
8 | © Copyright 8/16/23 Zilliz
Introduction to Vector DB
and Vector Search
9 | © Copyright 8/16/23 Zilliz
9 | © Copyright 8/16/23 Zilliz
Vectors Unlock Unstructured Data
Vector
Databases
10 | © Copyright 8/16/23 Zilliz
10 | © Copyright 8/16/23 Zilliz
Vector Space
11 | © Copyright 8/16/23 Zilliz
11 | © Copyright 8/16/23 Zilliz
Embeddings models workhorses of AI apps
12 | © Copyright 8/16/23 Zilliz
12 | © Copyright 8/16/23 Zilliz
Vectors are for more than just text and images
13 | © Copyright 8/16/23 Zilliz
13 | © Copyright 8/16/23 Zilliz
13 | © Copyright 8/16/23 Zilliz
13 | © Copyright 8/16/23 Zilliz
Multimodal Embeddings
14 | © Copyright 8/16/23 Zilliz
14 | © Copyright 8/16/23 Zilliz
Visual + language embeddings CLIP-like)
15 | © Copyright 8/16/23 Zilliz
15 | © Copyright 8/16/23 Zilliz
One embedding space, six modalities ImageBind)
Source: Girdhar, et al.
16 | © Copyright 8/16/23 Zilliz
16 | © Copyright 8/16/23 Zilliz
LLMs are becoming natively multimodal…
17 | © Copyright 8/16/23 Zilliz
17 | © Copyright 8/16/23 Zilliz
LLMs are becoming natively multimodal…
18 | © Copyright 8/16/23 Zilliz
18 | © Copyright 8/16/23 Zilliz
… and the best embedding models are too
19 | © Copyright 8/16/23 Zilliz
19 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
19
RAG
Retrieval Augmented Generation)
20 | © Copyright 8/16/23 Zilliz
20 | © Copyright 8/16/23 Zilliz
Basic Idea
Use RAG to force the LLM to work with your data
by injecting it via a vector database like Milvus
21 | © Copyright 8/16/23 Zilliz
21 | © Copyright 8/16/23 Zilliz
Basic RAG Architecture
22 | © Copyright 8/16/23 Zilliz
22 | © Copyright 8/16/23 Zilliz
Question + Context
Question
Vanilla RAG is no longer enough…
Gen AI Model
Reliable Answers
Your
Documents
Embedding Model
Milvus
Search
What is the default
AUTOINDEX distance
metric in Milvus
Client?
The default
AUTOINDEX distance
metric in Milvus
Client is L2.
23 | © Copyright 8/16/23 Zilliz
23 | © Copyright 8/16/23 Zilliz
Question + Context
Question
… we need multimodal RAG
Pixtral
Reliable Answers
Multimodal Embeddings
Milvus
Search
What kind of music
did they play in the
pre-show?
The musician played
improvised electronic
music.
24 | © Copyright 8/16/23 Zilliz
24 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
24
Building a Self-Hosted Multimodal
RAG System
Using Milvus and vLLM
25 | © Copyright 8/16/23 Zilliz
25 | © Copyright 8/16/23 Zilliz
● You've built your AI application around a cloud API
provider
⇒ Suddenly: "We're deprecating your model"
The problem with API solutions
26 | © Copyright 8/16/23 Zilliz
26 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
26
27 | © Copyright 8/16/23 Zilliz
27 | © Copyright 8/16/23 Zilliz
Self-Hosted Multimodal RAG
● Processes multiple data types (text, images, audio, video)
● Runs completely under your control
● Uses open-source
● Scales efficiently
28 | © Copyright 8/16/23 Zilliz
28 | © Copyright 8/16/23 Zilliz
● Milvus: Vector DB
● vLLM: Inference and serving
● Koyeb: Infrastructure Layer
● Pixtral: Pixtral: Multimodal model 400M
vision encoder + 12B decoder)
Tech Stack
29 | © Copyright 8/16/23 Zilliz
29 | © Copyright 8/16/23 Zilliz
Why vLLM?
Wide range of model support
● 40+ model architectures including
vision language models
● Collaborating with model vendors
Diverse hardware support
● NVIDIA, AMD, Intel GPUs
● Intel/AMD CPU
● Inferentia, TPU, Gaudi
End-to-end inference optimizations
● CUDA graph
● Speculative decoding
● Quantization GPTQ, AWQ, FP8
● Automatic prefix caching
30 | © Copyright 8/16/23 Zilliz
30 | © Copyright 8/16/23 Zilliz
Storage:
● Milvus collections for different
modalities
● Efficient indexing and retrieval
Query Processing:
● Context retrieval from vector store
● Multimodal understanding with
Pixtral
What is it doing?
Video Processing:
● Frame extraction 0.2 FPS
● Audio transcription Whisper)
● Metadata extraction
Embeddings:
● Images: OpenAI CLIP
● Text: Mistral Embedding model
31 | © Copyright 8/16/23 Zilliz
31 | © Copyright 8/16/23 Zilliz
Complete Control
● No unexpected API changes
● Full visibility into the system
● Customizable components
Privacy & Security
● Data stays in your infrastructure
● No external API dependencies
Scalability
● Horizontal scaling with Milvus
● Efficient resource use with vLLM
● Flexible deployment options
Benefits
32 | © Copyright 8/16/23 Zilliz
32 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
32
Demo!

MultiModal RAG using vLLM and Pixtral - Stephen Batifol

  • 1.
    1 | ©Copyright 8/16/23 Zilliz 1 | © Copyright 8/16/23 Zilliz Stephen Batifol | Zilliz Unstructured Data Meetup - Berlin MultiModal RAG using vLLM and Pixtral
  • 2.
    2 | ©Copyright 8/16/23 Zilliz 2 | © Copyright 8/16/23 Zilliz Stephen Batifol Developer Advocate, Zilliz/ Milvus stephen.batifol@zilliz.com linkedin.com/in/stephen-batifol/ @stephenbtl Speaker
  • 3.
    3 | ©Copyright 8/16/23 Zilliz 3 | © Copyright 8/16/23 Zilliz 30K GitHub Stars 25M Downloads 250 Contributors 2,600 + Forks Milvus is an open-source vector database for GenAI projects. pip install on your laptop, plug into popular AI dev tools, and push to production with a single line of code. Easy Setup pip install pymilvus to start coding in a notebook within seconds. Reusable Code Write once, and deploy with one line of code into the production environment Integration Plug into OpenAI, Langchain, LlamaIndex, and many more Feature-rich Dense & sparse embeddings, Filtering, Reranking and beyond
  • 4.
    4 | ©Copyright 8/16/23 Zilliz 4 | © Copyright 8/16/23 Zilliz Well-connected in the AI infrastructure Framework Hardware Infrastructure Embedding Models LLMs Software Infrastructure Vector Database
  • 5.
    5 | ©Copyright 8/16/23 Zilliz 5 | © Copyright 8/16/23 Zilliz ● pip install on your laptop ● Plug into your favorite AI dev tools ● Push to production with a single line of code Easy to start
  • 6.
    6 | ©Copyright 8/16/23 Zilliz 6 | © Copyright 8/16/23 Zilliz 2024 Milvus Lite Milvus Standalone Milvus Distributed ● Ideal for prototyping, small scale experiments. ● Easy to set up and use, pip instally pymilvus ● Scale to ≈1M vectors ● Run on K8s ● Load balancer and Multi-Node Management ● Scaling of each component independently ● Scale to 100B vectors ● Single-Node Deployment ● Bundled in a single Docker Image ● Supports Primary/ Secondary ● Scale up to 100M vectors Ready to scale 🚀 Write your code once, and run it everywhere, at scale! ● API and SDK are the same
  • 7.
    7 | ©Copyright 8/16/23 Zilliz 7 | © Copyright 8/16/23 Zilliz Retrieval Augmented Generation RAG Expand LLMs' knowledge by incorporating external data sources into LLMs and your AI applications. Match user behavior or content features with other similar ones to make effective recommendations. Recommender System Search for semantically similar texts across vast amounts of natural language documents. Text/ Semantic Search Image Similarity Search Identify and search for visually similar images or objects from a vast collection of image libraries. Video Similarity Search Search for similar videos, scenes, or objects from extensive collections of video libraries. Audio Similarity Search Find similar audios in large datasets for tasks like genre classification or speech recognition Molecular Similarity Search Search for similar substructures, superstructures, and other structures for a specific molecule. Anomaly Detection Detect data points, events, and observations that deviate significantly from the usual pattern Multimodal Similarity Search Search over multiple types of data simultaneously, e.g. text and images Common AI Use Cases
  • 8.
    8 | ©Copyright 8/16/23 Zilliz 8 | © Copyright 8/16/23 Zilliz 8 | © Copyright 8/16/23 Zilliz 8 | © Copyright 8/16/23 Zilliz Introduction to Vector DB and Vector Search
  • 9.
    9 | ©Copyright 8/16/23 Zilliz 9 | © Copyright 8/16/23 Zilliz Vectors Unlock Unstructured Data Vector Databases
  • 10.
    10 | ©Copyright 8/16/23 Zilliz 10 | © Copyright 8/16/23 Zilliz Vector Space
  • 11.
    11 | ©Copyright 8/16/23 Zilliz 11 | © Copyright 8/16/23 Zilliz Embeddings models workhorses of AI apps
  • 12.
    12 | ©Copyright 8/16/23 Zilliz 12 | © Copyright 8/16/23 Zilliz Vectors are for more than just text and images
  • 13.
    13 | ©Copyright 8/16/23 Zilliz 13 | © Copyright 8/16/23 Zilliz 13 | © Copyright 8/16/23 Zilliz 13 | © Copyright 8/16/23 Zilliz Multimodal Embeddings
  • 14.
    14 | ©Copyright 8/16/23 Zilliz 14 | © Copyright 8/16/23 Zilliz Visual + language embeddings CLIP-like)
  • 15.
    15 | ©Copyright 8/16/23 Zilliz 15 | © Copyright 8/16/23 Zilliz One embedding space, six modalities ImageBind) Source: Girdhar, et al.
  • 16.
    16 | ©Copyright 8/16/23 Zilliz 16 | © Copyright 8/16/23 Zilliz LLMs are becoming natively multimodal…
  • 17.
    17 | ©Copyright 8/16/23 Zilliz 17 | © Copyright 8/16/23 Zilliz LLMs are becoming natively multimodal…
  • 18.
    18 | ©Copyright 8/16/23 Zilliz 18 | © Copyright 8/16/23 Zilliz … and the best embedding models are too
  • 19.
    19 | ©Copyright 8/16/23 Zilliz 19 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 19 RAG Retrieval Augmented Generation)
  • 20.
    20 | ©Copyright 8/16/23 Zilliz 20 | © Copyright 8/16/23 Zilliz Basic Idea Use RAG to force the LLM to work with your data by injecting it via a vector database like Milvus
  • 21.
    21 | ©Copyright 8/16/23 Zilliz 21 | © Copyright 8/16/23 Zilliz Basic RAG Architecture
  • 22.
    22 | ©Copyright 8/16/23 Zilliz 22 | © Copyright 8/16/23 Zilliz Question + Context Question Vanilla RAG is no longer enough… Gen AI Model Reliable Answers Your Documents Embedding Model Milvus Search What is the default AUTOINDEX distance metric in Milvus Client? The default AUTOINDEX distance metric in Milvus Client is L2.
  • 23.
    23 | ©Copyright 8/16/23 Zilliz 23 | © Copyright 8/16/23 Zilliz Question + Context Question … we need multimodal RAG Pixtral Reliable Answers Multimodal Embeddings Milvus Search What kind of music did they play in the pre-show? The musician played improvised electronic music.
  • 24.
    24 | ©Copyright 8/16/23 Zilliz 24 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 24 Building a Self-Hosted Multimodal RAG System Using Milvus and vLLM
  • 25.
    25 | ©Copyright 8/16/23 Zilliz 25 | © Copyright 8/16/23 Zilliz ● You've built your AI application around a cloud API provider ⇒ Suddenly: "We're deprecating your model" The problem with API solutions
  • 26.
    26 | ©Copyright 8/16/23 Zilliz 26 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 26
  • 27.
    27 | ©Copyright 8/16/23 Zilliz 27 | © Copyright 8/16/23 Zilliz Self-Hosted Multimodal RAG ● Processes multiple data types (text, images, audio, video) ● Runs completely under your control ● Uses open-source ● Scales efficiently
  • 28.
    28 | ©Copyright 8/16/23 Zilliz 28 | © Copyright 8/16/23 Zilliz ● Milvus: Vector DB ● vLLM: Inference and serving ● Koyeb: Infrastructure Layer ● Pixtral: Pixtral: Multimodal model 400M vision encoder + 12B decoder) Tech Stack
  • 29.
    29 | ©Copyright 8/16/23 Zilliz 29 | © Copyright 8/16/23 Zilliz Why vLLM? Wide range of model support ● 40+ model architectures including vision language models ● Collaborating with model vendors Diverse hardware support ● NVIDIA, AMD, Intel GPUs ● Intel/AMD CPU ● Inferentia, TPU, Gaudi End-to-end inference optimizations ● CUDA graph ● Speculative decoding ● Quantization GPTQ, AWQ, FP8 ● Automatic prefix caching
  • 30.
    30 | ©Copyright 8/16/23 Zilliz 30 | © Copyright 8/16/23 Zilliz Storage: ● Milvus collections for different modalities ● Efficient indexing and retrieval Query Processing: ● Context retrieval from vector store ● Multimodal understanding with Pixtral What is it doing? Video Processing: ● Frame extraction 0.2 FPS ● Audio transcription Whisper) ● Metadata extraction Embeddings: ● Images: OpenAI CLIP ● Text: Mistral Embedding model
  • 31.
    31 | ©Copyright 8/16/23 Zilliz 31 | © Copyright 8/16/23 Zilliz Complete Control ● No unexpected API changes ● Full visibility into the system ● Customizable components Privacy & Security ● Data stays in your infrastructure ● No external API dependencies Scalability ● Horizontal scaling with Milvus ● Efficient resource use with vLLM ● Flexible deployment options Benefits
  • 32.
    32 | ©Copyright 8/16/23 Zilliz 32 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 32 Demo!