June 20, 2024
Unstructured Data and
Vector Databases
Tim Spann
Principal Developer Advocate, Zilliz
tim.spann@zilliz.com
https://www.linkedin.com/in/timothyspann/
https://x.com/paasdev
https://github.com/tspannhw
https://github.com/milvus-io/milvus
Speaker
Agenda Introduction
Unstructured data, vector databases, traditional databases, similarity
search
01
Vectors
Where, What, How, Why Vectors? We’ll cover a Vector Database
Architecture
02
Introducing Milvus
What drives Milvus' Emergence as the most widely
adopted vector database
03
4 | © Copyright Zilliz
4 | © Copyright Zilliz
4
Introduction
5 | © Copyright Zilliz
5
- Unstructured Data is 80% of data
- Vector Databases are the only type of database
that can work with unstructured data
- Examples of Unstructured Data include text,
images, videos, audio, etc
Why Vector Databases?
6 | © Copyright Zilliz
6
Traditional databases were built on exact
search
7 | © Copyright Zilliz
7
…which misses context, semantic meaning, and user intent
VS.
Apple
VS.
Rising dough
VS.
Change car tire
Rising Dough
Proofing Bread
✔
❌
8 | © Copyright Zilliz
8
Vector
Databases
Where do Vectors Come From?
…and cannot process increasingly growing unstructured data
*Data Source: The Digitization of the World by IDC
20%
Other
newly generated data in 2025
will be unstructured data
80%
10 | © Copyright Zilliz
10
The evolution of AI made the semantic search of
unstructured data possible
Search by Probability
Statistical analyses of common
datasets established the foundation for
processing unstructured data, e.g. NLP,
and image classification
AI Model Breakthrough
The advancements in BERT, ViT, CBT
etc. have revolutionized semantic
analysis across unstructured data
Vectorization
Word2Vec, CNNs, Deep Speech pioneered
unstructured data embeddings, mapping the
words, images, videos into high-dimensional
vectors
11 | © Copyright Zilliz
11
This new AI breakthrough requires new databases to
fully unleash its potential
Support multiple
use case types
Accommodate diverse data
requirements, enhancing
flexibility and effectiveness in
varied operational contexts
Scale as needed
Enable robust handling of
expanding data volumes and
search demands
Highly performant
Ensures swift and accurate
query responses, crucial for
optimal user experience
12 | © Copyright Zilliz
12
https://milvus.io/milvus-demos/reverse-image-search
Show Me
13 | © Copyright Zilliz
13
03 How do Vector
Databases Work?
14 | © Copyright Zilliz
14
Semantic Similarity
Image from Sutor et al
Woman = [0.3, 0.4]
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Woman = [0.3, 0.4]
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Man = [0.5, 0.2]
Queen - Woman + Man = King
Queen = [0.3, 0.9]
- Woman = [0.3, 0.4]
[0.0, 0.5]
+ Man = [0.5, 0.2]
King = [0.5, 0.7]
Man = [0.5, 0.2]
15 | © Copyright Zilliz
15
Vector Similarity Measures: L2 (Euclidean)
Queen = [0.3, 0.9]
King = [0.5, 0.7]
d(Queen, King) = √(0.3-0.5)2
+ (0.9-0.7)2
= √(0.2)2
+ (0.2)2
= √0.04 + 0.04
= √0.08 ≅ 0.28
16 | © Copyright Zilliz
16
Vector Similarity Measures: Inner Product (IP)
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Queen · King = (0.3*0.5) + (0.9*0.7)
= 0.15 + 0.63 = 0.78
17 | © Copyright Zilliz
17
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Vector Similarity Measures: Cosine
𝚹
cos(Queen, King) = (0.3*0.5)+(0.9*0.7)
√0.32
+0.92
* √0.52
+0.72
= 0.15+0.63 _
√0.9 * √0.74
= 0.78 _
√0.666
≅ 0.03
18 | © Copyright Zilliz
18 | © Copyright Zilliz
18
Introducing Milvus
19 | © Copyright Zilliz
19
Why Not Use a SQL/NoSQL Database?
• Inefficiency in High-dimensional spaces
• Suboptimal Indexing
• Inadequate query support
• Lack of scalability
• Limited analytics capabilities
• Data conversion issues
TL;DR: Vector operations are too computationally intensive for
traditional database infrastructures
20 | © Copyright Zilliz
20
Why Not Use a Vector Search Library?
• Have to manually implement filtering
• Not optimized to take advantage of the latest hardware
• Unable to handle large scale data
• Lack of lifecycle management
• Inefficient indexing capabilities
• No built in safety mechanisms
TL;DR: Vector search libraries lack the infrastructure to help you scale,
deploy, and manage your apps in production.
21 | © Copyright Zilliz
21
What is Milvus ideal for?
• Advanced filtering
• Hybrid search
• Durability and backups
• Replications/High Availability
• Sharding
• Aggregations
• Lifecycle management
• Multi-tenancy
• High query load
• High insertion/deletion
• Full precision/recall
• Accelerator support (GPU,
FPGA)
• Billion-scale storage
Purpose-built to store, index and query vector embeddings from unstructured data at scale.
22 | © Copyright Zilliz
22
We’ve built technologies for various types of use
cases
Compute Types
Designed for various
compute powers, such as
AVX512, Neon for SIMD,
quantization cache-aware
optimization and GPU
Leverage strengths of each
hardware type, ensuring
high-speed processing and
cost-effective scalability for
different application needs
Search Types
Support multiple types such
as top-K ANN, Range ANN,
sparse & dense,
multi-vector, grouping, and
metadata filtering
Enable query flexibility and
accuracy, allowing
developers to tailor their
information retrieval needs
Multi-tenancy
Enable multi-tenancy
through collection and
partition management
Allow for efficient resource
utilization and customizable
data segregation, ensuring
secure and isolated data
handling for each tenant
Index Types
Offer a wide range of 15
indexes support, including
popular ones like HNSW,
PQ, Binary, Sparse,
DiskANN and GPU index
Empower developers with
tailored search
optimizations, catering to
performance, accuracy and
cost needs
23 | © Copyright Zilliz
23
Meta Storage
Root Query Data Index
Coordinator Service
Proxy
Proxy
etcd
Log Broker
SDK
Load Balancer
DDL/DCL
DML
NOTIFICATION
CONTROL SIGNAL
Object Storage
Minio / S3 / AzureBlob
Log Snapshot Delta File Index File
Worker Node QUERY DATA DATA
Message
Storage
Access Layer
Query Node Data Node Index Node
Milvus’ fully distributed architecture is designed
scalability and performance
24 | © Copyright Zilliz
24
Milvus: From Dev to Prod
AI Powered Search made easy
Milvus is an Open-Source Vector
Database to store, index, manage, and
use the massive number of embedding
vectors generated by deep neural
networks and LLMs.
contributors
267+
stars
27K+
downloads
25M+
forks
2K+
25 | © Copyright Zilliz
25
Retrieval Augmented
Generation (RAG)
Expand LLMs' knowledge by
incorporating external data sources
into LLMs and your AI applications.
Match user behavior or content
features with other similar ones to
make effective recommendations.
Recommender System
Search for semantically similar
texts across vast amounts of
natural language documents.
Text/ Semantic Search
Image Similarity Search
Identify and search for visually
similar images or objects from a
vast collection of image libraries.
Video Similarity Search
Search for similar videos, scenes,
or objects from extensive
collections of video libraries.
Audio Similarity Search
Find similar audios in large datasets
for tasks like genre classification or
speech recognition
Molecular Similarity Search
Search for similar substructures,
superstructures, and other
structures for a specific molecule.
Anomaly Detection
Detect data points, events, and
observations that deviate
significantly from the usual pattern
Multimodal Similarity Search
Search over multiple types of data
simultaneously, e.g. text and
images
…powers searches across various types of
unstructured data
26 | © Copyright Zilliz
26
27 | © Copyright Zilliz
27
Up to 100 billion vectors with K8s!
28 | © Copyright Zilliz
28
29 | © Copyright Zilliz
29
Vector Database Resources
Give Milvus a Star! Chat with me on Discord!
https://github.com/milvus-io/milvus
30
Unstructured Data Meetup
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics
such as vector databases, LLMs, and managing data at scale. The intended audience of this group
includes roles like machine learning engineers, data scientists, data engineers, software engineers, and
PMs.
This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
31 | © Copyright Zilliz
31 | © Copyright Zilliz
31
RESOURCES
https://medium.com/@tspann/unstructured-data-processing-with-a-raspberry-pi-ai-kit-c959dd7fff47
Raspberry Pi AI Kit Hailo
Edge AI
https://medium.com/@tspann/unstructured-street-data-in-new-york-8d3cde0a1e5b
https://medium.com/@tspann/not-every-field-is-just-text-numbers-or-vectors-976231e90e4d
https://medium.com/@tspann/shining-some-light-on-the-new-milvus-lite-5a0565eb5dd9
Extracting Value from Unstructured Data
Example
• A company has 100,000s+ pages of
proprietary documentation to enable
their staff to service customers.
Problem
• Searching can be slow, inefficient, or
lack context.
Solution
• Create internal chatbot with ChatGPT
and a vector database enriched with
company documentation to provide
direction and support to employees
and customers.
https://osschat.io/chat
We provide deployment flexibility for different operational, security and compliance requirements
BRING YOUR OWN CLOUD
Zilliz BYOC
Enterprise-ready Milvus for
Private VPCs
Deploy in your virtual private cloud
Zilliz Cloud
Milvus Re-engineered for the
Cloud
Available on the leading public
clouds
FULLY MANAGED SERVICE
Coming Soon! Coming Soon!
Milvus
Most widely-adopted open
source vector database
Self hosted on any machine with
community support
SELF MANAGED SOFTWARE
Local Docker K8s
39 | © Copyright Zilliz
39
Well-connected in LLM infrastructure to enable RAG
use cases
Framework
Hardware
Infrastructure
Embedding Models LLMs
Software Infrastructure
Vector Database
40 | © Copyright Zilliz
40
T H A N K Y O U
41 | © Copyright Zilliz
41
Milvus Dependencies
https://zilliz.com/blog/Milvus-server-docker-installation-and-packaging-dependencies
🧱 Main Dependencies:
● FAISS 🔍 (vector search)
● etcd 🔑 (metadata store)
● Pulsar/Kafka 📢 (messaging)
● Tantivy 🔎 (text search)
● RocksDB 💾 (storage)
● Object Storage 🗄 (Minio/S3/GCS/Azure Blob Storage)
● Kubernetes 🐳 (containerization)
● StorageClass & Persistent Volumes 🗄(Storage Management for etcd and Pulsar)
● Prometheus & Grafana 📈 (monitoring)
📦 Docker Image Size: ~500MB
🆕 Release Frequency: ~1x per month, with frequent minor releases
🛠 SDKs Available: Python 🐍, Node 🌐, Go 🐹, C# 💻, Java ☕, Ruby 💎
💻 Python SDK Installation: pip install pymilvus
✅ Version Compatibility: Ensure SDK and Milvus server versions match (major.minor)
…different types of data and schemas needs to be thoroughly planned ahead of time
44 | © Copyright Zilliz
44
• Search Quality - Hybrid Search? Filtering?
• Scalability - Billions of vectors?
• Multi tenancy - Isolating Multi-Tenant data
• Cost - Memory, disk, S3?
• Security - Data Safety and Privacy
TL;DR: Vector search libraries lack the infrastructure to help you scale,
deploy, and manage your apps in production.
Why Not Vector Search Libraries?
45 | © Copyright Zilliz
45
Why Not Use a SQL/NoSQL Database?
• Inefficiency in High-dimensional spaces
• Suboptimal Indexing
• Inadequate query support
• Lack of scalability
• Limited analytics capabilities
• Data conversion issues
TL;DR: Vector operations are too computationally intensive for
traditional database infrastructures
46 | © Copyright Zilliz
46
What is Milvus/Zilliz ideal for?
• Advanced filtering
• Hybrid search
• Multi-vector Search
• Durability and backups
• Replications/High Availability
• Sharding
• Aggregations
• Lifecycle management
• Multi-tenancy
• High query load
• High insertion/deletion
• Full precision/recall
• Accelerator support (GPU,
FPGA)
• Billion-scale storage
Purpose-built to store, index and query vector embeddings from unstructured data at scale.
47 | © Copyright Zilliz
47
Vector Databases are purpose-built to handle
indexing, storing, and querying vector data.
Milvus & Zilliz are specifically designed for high
performance and billion+ scale use cases.
Takeaway:
48 | © Copyright Zilliz
48
Inverted File Index
Source:
https://towardsdatascience.com/similarity-search-with-ivfpq-9c6348fd4db3
49 | © Copyright Zilliz
49
HNSW
Source:
https://arxiv.org/ftp/arxiv/papers/1603/1603.09320.pdf
50 | © Copyright Zilliz
50
SQ

06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases

  • 1.
    June 20, 2024 UnstructuredData and Vector Databases
  • 2.
    Tim Spann Principal DeveloperAdvocate, Zilliz tim.spann@zilliz.com https://www.linkedin.com/in/timothyspann/ https://x.com/paasdev https://github.com/tspannhw https://github.com/milvus-io/milvus Speaker
  • 3.
    Agenda Introduction Unstructured data,vector databases, traditional databases, similarity search 01 Vectors Where, What, How, Why Vectors? We’ll cover a Vector Database Architecture 02 Introducing Milvus What drives Milvus' Emergence as the most widely adopted vector database 03
  • 4.
    4 | ©Copyright Zilliz 4 | © Copyright Zilliz 4 Introduction
  • 5.
    5 | ©Copyright Zilliz 5 - Unstructured Data is 80% of data - Vector Databases are the only type of database that can work with unstructured data - Examples of Unstructured Data include text, images, videos, audio, etc Why Vector Databases?
  • 6.
    6 | ©Copyright Zilliz 6 Traditional databases were built on exact search
  • 7.
    7 | ©Copyright Zilliz 7 …which misses context, semantic meaning, and user intent VS. Apple VS. Rising dough VS. Change car tire Rising Dough Proofing Bread ✔ ❌
  • 8.
    8 | ©Copyright Zilliz 8 Vector Databases Where do Vectors Come From?
  • 9.
    …and cannot processincreasingly growing unstructured data *Data Source: The Digitization of the World by IDC 20% Other newly generated data in 2025 will be unstructured data 80%
  • 10.
    10 | ©Copyright Zilliz 10 The evolution of AI made the semantic search of unstructured data possible Search by Probability Statistical analyses of common datasets established the foundation for processing unstructured data, e.g. NLP, and image classification AI Model Breakthrough The advancements in BERT, ViT, CBT etc. have revolutionized semantic analysis across unstructured data Vectorization Word2Vec, CNNs, Deep Speech pioneered unstructured data embeddings, mapping the words, images, videos into high-dimensional vectors
  • 11.
    11 | ©Copyright Zilliz 11 This new AI breakthrough requires new databases to fully unleash its potential Support multiple use case types Accommodate diverse data requirements, enhancing flexibility and effectiveness in varied operational contexts Scale as needed Enable robust handling of expanding data volumes and search demands Highly performant Ensures swift and accurate query responses, crucial for optimal user experience
  • 12.
    12 | ©Copyright Zilliz 12 https://milvus.io/milvus-demos/reverse-image-search Show Me
  • 13.
    13 | ©Copyright Zilliz 13 03 How do Vector Databases Work?
  • 14.
    14 | ©Copyright Zilliz 14 Semantic Similarity Image from Sutor et al Woman = [0.3, 0.4] Queen = [0.3, 0.9] King = [0.5, 0.7] Woman = [0.3, 0.4] Queen = [0.3, 0.9] King = [0.5, 0.7] Man = [0.5, 0.2] Queen - Woman + Man = King Queen = [0.3, 0.9] - Woman = [0.3, 0.4] [0.0, 0.5] + Man = [0.5, 0.2] King = [0.5, 0.7] Man = [0.5, 0.2]
  • 15.
    15 | ©Copyright Zilliz 15 Vector Similarity Measures: L2 (Euclidean) Queen = [0.3, 0.9] King = [0.5, 0.7] d(Queen, King) = √(0.3-0.5)2 + (0.9-0.7)2 = √(0.2)2 + (0.2)2 = √0.04 + 0.04 = √0.08 ≅ 0.28
  • 16.
    16 | ©Copyright Zilliz 16 Vector Similarity Measures: Inner Product (IP) Queen = [0.3, 0.9] King = [0.5, 0.7] Queen · King = (0.3*0.5) + (0.9*0.7) = 0.15 + 0.63 = 0.78
  • 17.
    17 | ©Copyright Zilliz 17 Queen = [0.3, 0.9] King = [0.5, 0.7] Vector Similarity Measures: Cosine 𝚹 cos(Queen, King) = (0.3*0.5)+(0.9*0.7) √0.32 +0.92 * √0.52 +0.72 = 0.15+0.63 _ √0.9 * √0.74 = 0.78 _ √0.666 ≅ 0.03
  • 18.
    18 | ©Copyright Zilliz 18 | © Copyright Zilliz 18 Introducing Milvus
  • 19.
    19 | ©Copyright Zilliz 19 Why Not Use a SQL/NoSQL Database? • Inefficiency in High-dimensional spaces • Suboptimal Indexing • Inadequate query support • Lack of scalability • Limited analytics capabilities • Data conversion issues TL;DR: Vector operations are too computationally intensive for traditional database infrastructures
  • 20.
    20 | ©Copyright Zilliz 20 Why Not Use a Vector Search Library? • Have to manually implement filtering • Not optimized to take advantage of the latest hardware • Unable to handle large scale data • Lack of lifecycle management • Inefficient indexing capabilities • No built in safety mechanisms TL;DR: Vector search libraries lack the infrastructure to help you scale, deploy, and manage your apps in production.
  • 21.
    21 | ©Copyright Zilliz 21 What is Milvus ideal for? • Advanced filtering • Hybrid search • Durability and backups • Replications/High Availability • Sharding • Aggregations • Lifecycle management • Multi-tenancy • High query load • High insertion/deletion • Full precision/recall • Accelerator support (GPU, FPGA) • Billion-scale storage Purpose-built to store, index and query vector embeddings from unstructured data at scale.
  • 22.
    22 | ©Copyright Zilliz 22 We’ve built technologies for various types of use cases Compute Types Designed for various compute powers, such as AVX512, Neon for SIMD, quantization cache-aware optimization and GPU Leverage strengths of each hardware type, ensuring high-speed processing and cost-effective scalability for different application needs Search Types Support multiple types such as top-K ANN, Range ANN, sparse & dense, multi-vector, grouping, and metadata filtering Enable query flexibility and accuracy, allowing developers to tailor their information retrieval needs Multi-tenancy Enable multi-tenancy through collection and partition management Allow for efficient resource utilization and customizable data segregation, ensuring secure and isolated data handling for each tenant Index Types Offer a wide range of 15 indexes support, including popular ones like HNSW, PQ, Binary, Sparse, DiskANN and GPU index Empower developers with tailored search optimizations, catering to performance, accuracy and cost needs
  • 23.
    23 | ©Copyright Zilliz 23 Meta Storage Root Query Data Index Coordinator Service Proxy Proxy etcd Log Broker SDK Load Balancer DDL/DCL DML NOTIFICATION CONTROL SIGNAL Object Storage Minio / S3 / AzureBlob Log Snapshot Delta File Index File Worker Node QUERY DATA DATA Message Storage Access Layer Query Node Data Node Index Node Milvus’ fully distributed architecture is designed scalability and performance
  • 24.
    24 | ©Copyright Zilliz 24 Milvus: From Dev to Prod AI Powered Search made easy Milvus is an Open-Source Vector Database to store, index, manage, and use the massive number of embedding vectors generated by deep neural networks and LLMs. contributors 267+ stars 27K+ downloads 25M+ forks 2K+
  • 25.
    25 | ©Copyright Zilliz 25 Retrieval Augmented Generation (RAG) Expand LLMs' knowledge by incorporating external data sources into LLMs and your AI applications. Match user behavior or content features with other similar ones to make effective recommendations. Recommender System Search for semantically similar texts across vast amounts of natural language documents. Text/ Semantic Search Image Similarity Search Identify and search for visually similar images or objects from a vast collection of image libraries. Video Similarity Search Search for similar videos, scenes, or objects from extensive collections of video libraries. Audio Similarity Search Find similar audios in large datasets for tasks like genre classification or speech recognition Molecular Similarity Search Search for similar substructures, superstructures, and other structures for a specific molecule. Anomaly Detection Detect data points, events, and observations that deviate significantly from the usual pattern Multimodal Similarity Search Search over multiple types of data simultaneously, e.g. text and images …powers searches across various types of unstructured data
  • 26.
    26 | ©Copyright Zilliz 26
  • 27.
    27 | ©Copyright Zilliz 27 Up to 100 billion vectors with K8s!
  • 28.
    28 | ©Copyright Zilliz 28
  • 29.
    29 | ©Copyright Zilliz 29 Vector Database Resources Give Milvus a Star! Chat with me on Discord! https://github.com/milvus-io/milvus
  • 30.
    30 Unstructured Data Meetup https://www.meetup.com/unstructured-data-meetup-new-york/ Thismeetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs. This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
  • 31.
    31 | ©Copyright Zilliz 31 | © Copyright Zilliz 31 RESOURCES
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
    Extracting Value fromUnstructured Data Example • A company has 100,000s+ pages of proprietary documentation to enable their staff to service customers. Problem • Searching can be slow, inefficient, or lack context. Solution • Create internal chatbot with ChatGPT and a vector database enriched with company documentation to provide direction and support to employees and customers. https://osschat.io/chat
  • 38.
    We provide deploymentflexibility for different operational, security and compliance requirements BRING YOUR OWN CLOUD Zilliz BYOC Enterprise-ready Milvus for Private VPCs Deploy in your virtual private cloud Zilliz Cloud Milvus Re-engineered for the Cloud Available on the leading public clouds FULLY MANAGED SERVICE Coming Soon! Coming Soon! Milvus Most widely-adopted open source vector database Self hosted on any machine with community support SELF MANAGED SOFTWARE Local Docker K8s
  • 39.
    39 | ©Copyright Zilliz 39 Well-connected in LLM infrastructure to enable RAG use cases Framework Hardware Infrastructure Embedding Models LLMs Software Infrastructure Vector Database
  • 40.
    40 | ©Copyright Zilliz 40 T H A N K Y O U
  • 41.
    41 | ©Copyright Zilliz 41 Milvus Dependencies https://zilliz.com/blog/Milvus-server-docker-installation-and-packaging-dependencies 🧱 Main Dependencies: ● FAISS 🔍 (vector search) ● etcd 🔑 (metadata store) ● Pulsar/Kafka 📢 (messaging) ● Tantivy 🔎 (text search) ● RocksDB 💾 (storage) ● Object Storage 🗄 (Minio/S3/GCS/Azure Blob Storage) ● Kubernetes 🐳 (containerization) ● StorageClass & Persistent Volumes 🗄(Storage Management for etcd and Pulsar) ● Prometheus & Grafana 📈 (monitoring) 📦 Docker Image Size: ~500MB 🆕 Release Frequency: ~1x per month, with frequent minor releases 🛠 SDKs Available: Python 🐍, Node 🌐, Go 🐹, C# 💻, Java ☕, Ruby 💎 💻 Python SDK Installation: pip install pymilvus ✅ Version Compatibility: Ensure SDK and Milvus server versions match (major.minor)
  • 43.
    …different types ofdata and schemas needs to be thoroughly planned ahead of time
  • 44.
    44 | ©Copyright Zilliz 44 • Search Quality - Hybrid Search? Filtering? • Scalability - Billions of vectors? • Multi tenancy - Isolating Multi-Tenant data • Cost - Memory, disk, S3? • Security - Data Safety and Privacy TL;DR: Vector search libraries lack the infrastructure to help you scale, deploy, and manage your apps in production. Why Not Vector Search Libraries?
  • 45.
    45 | ©Copyright Zilliz 45 Why Not Use a SQL/NoSQL Database? • Inefficiency in High-dimensional spaces • Suboptimal Indexing • Inadequate query support • Lack of scalability • Limited analytics capabilities • Data conversion issues TL;DR: Vector operations are too computationally intensive for traditional database infrastructures
  • 46.
    46 | ©Copyright Zilliz 46 What is Milvus/Zilliz ideal for? • Advanced filtering • Hybrid search • Multi-vector Search • Durability and backups • Replications/High Availability • Sharding • Aggregations • Lifecycle management • Multi-tenancy • High query load • High insertion/deletion • Full precision/recall • Accelerator support (GPU, FPGA) • Billion-scale storage Purpose-built to store, index and query vector embeddings from unstructured data at scale.
  • 47.
    47 | ©Copyright Zilliz 47 Vector Databases are purpose-built to handle indexing, storing, and querying vector data. Milvus & Zilliz are specifically designed for high performance and billion+ scale use cases. Takeaway:
  • 48.
    48 | ©Copyright Zilliz 48 Inverted File Index Source: https://towardsdatascience.com/similarity-search-with-ivfpq-9c6348fd4db3
  • 49.
    49 | ©Copyright Zilliz 49 HNSW Source: https://arxiv.org/ftp/arxiv/papers/1603/1603.09320.pdf
  • 50.
    50 | ©Copyright Zilliz 50 SQ