People often say that vector search is easy, but that's not entirely true. Vector search is more than just vector indexing and a Python wrapper. If you want to build a high-performance, scalable, and production-ready vector search service, you need to consider many factors.
2. Speaker Bio
James
VP of Engineering @ Zilliz
Milvus Maintainer, chief architect for Milvus 2.0.
Former employee at Oracle and Alibaba, seasoned
open-source developer.
3. People say vector search is easy
Finish in 10 lines of code
No need to learn any knowledge
About machine learning and AI
No ETL needed
No Knob tuning…
4. As easy as a numpy knn?
But it never works in production!
5. But they are actually not
Things you need to consider
To build real world application
• Search quality - Hybrid Search? Filtering?
• Scalability - Handling Billions of Vectors
• Multi tenancy - Isolating Multi-tenant data
• Cost - Memory, disk or S3?
• Security - Data safety and Privacy
• A real world example is way complicated..
6. Lesson 101: Put VectorDB into production
01 Design your schema
02 Think about how to scale
03 Pick your right index and tune
7. Design your schema - Dynamic or Fixed?
Flexible, no need to align between each row
Dynamic schema
Save memory with compacted format
Fix schema
Performant
fi
ltering on columnar format
No need to align between entities
Hybrid schema
Have both
fi
x and dynamic schema
fi
eld
8. Pick primary and partition key
Primary key is unique, like a chunkID or imageID
Primary Key
Partition key is extremely useful on multi tenant case
Partition key
Partition key is used to map data into partition
Primary key can be auto generated or user de
fi
ne
Clustering Key (New)
Cluster data in the back ground, search partial clusters
Can be any scalar
fi
eld for OSS, and vector
fi
eld is
available on Zilliz cloud
Data is mapped to shard by hash(Primary key)
11. Think about how to scale
Collection: Similar to a table in a traditional database. Each collection is
contained within a single database. Database can be isolated by resource group
Shard: Collections are sharded based on a hash of the primary key (PK). The
shard number cannot be dynamically changed for now.
Partition: Refers to a field that you frequently filter on, such as departmentID,
date, or goods type. Milvus currently supports up to 1024 partitions.
Segment: The minimal unit for balancing and building indexes. There are two
types of segments: growing and sealed.
• Growing Segment: A segment that is actively receiving new data inserts.
• Sealed Segment: A read-only segment that has completed indexing and is
ready for query operations.
Replica: Similar to database replication. Creating more replicas can improve
failure recovery speed and read throughput.
12. So many concepts, any best practice?
Data size
A Milvus collection can host over 10 billion data entries. Each shard hosts 100-500 million data entries,
and in most cases, 1-2 shards are more than enough. If you have an intensive write workload, increasing
the number of shards can also help to improve the write throughput.
Tenant number
Use collections to isolate tenants if your number of tenants is less than 10,000. For many tenants, use
partition keys. Partition keys use a hybrid approach of logical and physical partitioning thus can support
an unlimited number of tenants.
What about QPS?
Milvus is distributed, usually add more query nodes boost performance. For small datasets, increase
memory replicas can help on distributing query loads evenly to more query nodes.
13. How to pick index?
GPU index: FAISS GPU, Nvidia CAGRA
Memory index: FAISS, HNSW, ZILLIZ Cardinal
Disk index: DiskANN, ZILLIZ Cardinal
Swap Index: ZILLIZ Serverless - Est. April
14. Tune your index
How to ensure both?
How to evaluate indexes?
1. Pick the right index type
2. Tune the index parameter
3. Benchmarking it with VectorDB bench
4. Tune the the search parameter
https://github.com/zilliztech/VectorDBBench
15. Index Cheat sheet
index Accuracy Latency Throuput Index Time Cost
Cagra(GPU) High Low Very High Fast Very High
HNSW High Low High Slow High
ScaNN Mid Mid High Mid Mid
IVF_FLAT Mid Mid Low Fast Mid
IVF +
Quantization
Low Mid Mid Mid Low
DiskANN High High Mid Very Slow Low
16. THANK YOU FOR WATCHING
https://github.com/milvus-io/milvus
https://zilliz.com