Performance Optimize 101
James Luan
2024.3.19
Key factors for Scalable and Performant Vector Search
Speaker Bio
James
VP of Engineering @ Zilliz
Milvus Maintainer, chief architect for Milvus 2.0.
Former employee at Oracle and Alibaba, seasoned
open-source developer.
People say vector search is easy
Finish in 10 lines of code
No need to learn any knowledge
About machine learning and AI
No ETL needed
No Knob tuning…
As easy as a numpy knn?
But it never works in production!
But they are actually not
Things you need to consider
To build real world application
• Search quality - Hybrid Search? Filtering?
• Scalability - Handling Billions of Vectors
• Multi tenancy - Isolating Multi-tenant data
• Cost - Memory, disk or S3?
• Security - Data safety and Privacy
• A real world example is way complicated..
Lesson 101: Put VectorDB into production
01 Design your schema
02 Think about how to scale
03 Pick your right index and tune
Design your schema - Dynamic or Fixed?
Flexible, no need to align between each row
Dynamic schema
Save memory with compacted format
Fix schema
Performant
fi
ltering on columnar format
No need to align between entities
Hybrid schema
Have both
fi
x and dynamic schema
fi
eld
Pick primary and partition key
Primary key is unique, like a chunkID or imageID
Primary Key
Partition key is extremely useful on multi tenant case
Partition key
Partition key is used to map data into partition
Primary key can be auto generated or user de
fi
ne
Clustering Key (New)
Cluster data in the back ground, search partial clusters
Can be any scalar
fi
eld for OSS, and vector
fi
eld is
available on Zilliz cloud
Data is mapped to shard by hash(Primary key)
Pick embedding types
Distance : IP
Models: Splade, BGE-M3
Index: Wand, Graph
Distance : IP,L2, Cosine
Models: OpenAI, BGE, Cohere
Index: Faiss, HNSW
Distance : Hamming,
Superstructure, Jaccard, Tanimoto
Models: Cohere, Meta ESM-2
Index: Faiss
Schema Demo for a typical RAG application
Think about how to scale
Collection: Similar to a table in a traditional database. Each collection is
contained within a single database. Database can be isolated by resource group
Shard: Collections are sharded based on a hash of the primary key (PK). The
shard number cannot be dynamically changed for now.
Partition: Refers to a field that you frequently filter on, such as departmentID,
date, or goods type. Milvus currently supports up to 1024 partitions.
Segment: The minimal unit for balancing and building indexes. There are two
types of segments: growing and sealed.
• Growing Segment: A segment that is actively receiving new data inserts.
• Sealed Segment: A read-only segment that has completed indexing and is
ready for query operations.
Replica: Similar to database replication. Creating more replicas can improve
failure recovery speed and read throughput.
So many concepts, any best practice?
Data size
A Milvus collection can host over 10 billion data entries. Each shard hosts 100-500 million data entries,
and in most cases, 1-2 shards are more than enough. If you have an intensive write workload, increasing
the number of shards can also help to improve the write throughput.
Tenant number
Use collections to isolate tenants if your number of tenants is less than 10,000. For many tenants, use
partition keys. Partition keys use a hybrid approach of logical and physical partitioning thus can support
an unlimited number of tenants.
What about QPS?
Milvus is distributed, usually add more query nodes boost performance. For small datasets, increase
memory replicas can help on distributing query loads evenly to more query nodes.
How to pick index?
GPU index: FAISS GPU, Nvidia CAGRA
Memory index: FAISS, HNSW, ZILLIZ Cardinal
Disk index: DiskANN, ZILLIZ Cardinal
Swap Index: ZILLIZ Serverless - Est. April
Tune your index
How to ensure both?
How to evaluate indexes?
1. Pick the right index type
2. Tune the index parameter
3. Benchmarking it with VectorDB bench
4. Tune the the search parameter
https://github.com/zilliztech/VectorDBBench
Index Cheat sheet
index Accuracy Latency Throuput Index Time Cost
Cagra(GPU) High Low Very High Fast Very High
HNSW High Low High Slow High
ScaNN Mid Mid High Mid Mid
IVF_FLAT Mid Mid Low Fast Mid
IVF +
Quantization
Low Mid Mid Mid Low
DiskANN High High Mid Very Slow Low
THANK YOU FOR WATCHING
https://github.com/milvus-io/milvus
https://zilliz.com

VectorDB Schema Design 101 - Considerations for Building a Scalable and Performant Vector Search

  • 1.
    Performance Optimize 101 JamesLuan 2024.3.19 Key factors for Scalable and Performant Vector Search
  • 2.
    Speaker Bio James VP ofEngineering @ Zilliz Milvus Maintainer, chief architect for Milvus 2.0. Former employee at Oracle and Alibaba, seasoned open-source developer.
  • 3.
    People say vectorsearch is easy Finish in 10 lines of code No need to learn any knowledge About machine learning and AI No ETL needed No Knob tuning…
  • 4.
    As easy asa numpy knn? But it never works in production!
  • 5.
    But they areactually not Things you need to consider To build real world application • Search quality - Hybrid Search? Filtering? • Scalability - Handling Billions of Vectors • Multi tenancy - Isolating Multi-tenant data • Cost - Memory, disk or S3? • Security - Data safety and Privacy • A real world example is way complicated..
  • 6.
    Lesson 101: PutVectorDB into production 01 Design your schema 02 Think about how to scale 03 Pick your right index and tune
  • 7.
    Design your schema- Dynamic or Fixed? Flexible, no need to align between each row Dynamic schema Save memory with compacted format Fix schema Performant fi ltering on columnar format No need to align between entities Hybrid schema Have both fi x and dynamic schema fi eld
  • 8.
    Pick primary andpartition key Primary key is unique, like a chunkID or imageID Primary Key Partition key is extremely useful on multi tenant case Partition key Partition key is used to map data into partition Primary key can be auto generated or user de fi ne Clustering Key (New) Cluster data in the back ground, search partial clusters Can be any scalar fi eld for OSS, and vector fi eld is available on Zilliz cloud Data is mapped to shard by hash(Primary key)
  • 9.
    Pick embedding types Distance: IP Models: Splade, BGE-M3 Index: Wand, Graph Distance : IP,L2, Cosine Models: OpenAI, BGE, Cohere Index: Faiss, HNSW Distance : Hamming, Superstructure, Jaccard, Tanimoto Models: Cohere, Meta ESM-2 Index: Faiss
  • 10.
    Schema Demo fora typical RAG application
  • 11.
    Think about howto scale Collection: Similar to a table in a traditional database. Each collection is contained within a single database. Database can be isolated by resource group Shard: Collections are sharded based on a hash of the primary key (PK). The shard number cannot be dynamically changed for now. Partition: Refers to a field that you frequently filter on, such as departmentID, date, or goods type. Milvus currently supports up to 1024 partitions. Segment: The minimal unit for balancing and building indexes. There are two types of segments: growing and sealed. • Growing Segment: A segment that is actively receiving new data inserts. • Sealed Segment: A read-only segment that has completed indexing and is ready for query operations. Replica: Similar to database replication. Creating more replicas can improve failure recovery speed and read throughput.
  • 12.
    So many concepts,any best practice? Data size A Milvus collection can host over 10 billion data entries. Each shard hosts 100-500 million data entries, and in most cases, 1-2 shards are more than enough. If you have an intensive write workload, increasing the number of shards can also help to improve the write throughput. Tenant number Use collections to isolate tenants if your number of tenants is less than 10,000. For many tenants, use partition keys. Partition keys use a hybrid approach of logical and physical partitioning thus can support an unlimited number of tenants. What about QPS? Milvus is distributed, usually add more query nodes boost performance. For small datasets, increase memory replicas can help on distributing query loads evenly to more query nodes.
  • 13.
    How to pickindex? GPU index: FAISS GPU, Nvidia CAGRA Memory index: FAISS, HNSW, ZILLIZ Cardinal Disk index: DiskANN, ZILLIZ Cardinal Swap Index: ZILLIZ Serverless - Est. April
  • 14.
    Tune your index Howto ensure both? How to evaluate indexes? 1. Pick the right index type 2. Tune the index parameter 3. Benchmarking it with VectorDB bench 4. Tune the the search parameter https://github.com/zilliztech/VectorDBBench
  • 15.
    Index Cheat sheet indexAccuracy Latency Throuput Index Time Cost Cagra(GPU) High Low Very High Fast Very High HNSW High Low High Slow High ScaNN Mid Mid High Mid Mid IVF_FLAT Mid Mid Low Fast Mid IVF + Quantization Low Mid Mid Mid Low DiskANN High High Mid Very Slow Low
  • 16.
    THANK YOU FORWATCHING https://github.com/milvus-io/milvus https://zilliz.com