©2024 DataStax – All rights reserved.
Modern Vector Search
SW2Con 2024
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved.
magic
[0.3025549650192261,
0.1912980079650879,
0.04950578138232231,
0.13541743159294128,
0.22033651173114777,
0.3047471046447754,
0.03519149497151375,
0.41724318265914917,
0.46010446548461914,
0.13088607788085938,
0.11903445422649384,
0.30909594893455505,
0.2992345690727234,
0.17327798902988434,
0.02294405922293663,
0.20794396102428436,
0.46378788352012634,
0.16246692836284637,
0.7109631896018982,
0.20986509323120117,
0.1922052949666977,
...
2048
Dimensions
©2024 DataStax – All rights reserved.
K Nearest Neighbors search (KNN)
4
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved.
The curse of dimensionality, KNN edition
6
©2024 DataStax – All rights reserved.
The curse of dimensionality, KNN edition
7
©2024 DataStax – All rights reserved.
The curse of dimensionality, KNN edition
8
3D: 0.1%
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved. 9
ANN (Approximate Nearest Neighbor)
©2024 DataStax – All rights reserved.
Partitioned and graph indexes
10
©2024 DataStax – All rights reserved.
Milestones in ANN
11
● 2015: FastScan
● 2016: HNSW
● 2017: Quick ADC
● 2018: Quicker ADC
● 2019: DiskANN
● 2020: SCANN and APQ
● 2021: NGT QG
● 2022: SPANN
● 2023: LVQ
● 2024: JVector LTM
Compression
Graph construction
Compression
Compression
Graph construction + compression
Compression
Compression
Partitioning
Compression
Graph construction
©2024 DataStax – All rights reserved.
IVF Partitioning
12
©2024 DataStax – All rights reserved.
Search
13
©2024 DataStax – All rights reserved.
KMeans
14
©2024 DataStax – All rights reserved.
IVF search
15
● Coarse: O(centroids)
● Accurate: O(M * N/centroids)
● Centroid count needs to be relatively high
○ FAISS recommends O(sqrt(N)) = 64K for N in 1..10M
©2024 DataStax – All rights reserved.
SPANN (MSR, 2022)
16
● Better partitioning
● Dynamic pruning during search
● Hybrid architecture: centroids in memory, postings lists
on disk
● Scales to 1B+ vectors
©2024 DataStax – All rights reserved.
Vector database index choices
17
● Astra
● Lucene
● Milvus
● Pinecone
● Qdrant
● Weaviate
● pgVector
Graph
Graph
Graph
Graph
Graph
Graph
Partitioned Graph
©2024 DataStax – All rights reserved.
Partitioning downsides
18
● KMeans is O(t*k*n*d)
● Incremental construction is difficult and slow
● Difficult to handle deletes
● SOTA is relatively complex
©2024 DataStax – All rights reserved.
Graph indexes
19
©2024 DataStax – All rights reserved.
HNSW (Malkov + Yashunin, 2016)
20
● First modern graph index
● Still in use in e.g. Lucene
● Single-pass search, everything in memory
©2024 DataStax – All rights reserved.
HNSW: diversity heuristic
21
©2024 DataStax – All rights reserved.
Hierarchical NSW
22
©2024 DataStax – All rights reserved.
Larger-than-memory HNSW
23
©2024 DataStax – All rights reserved.
DiskANN (MSR, 2019)
24
● Single graph layer
● Coarse + Accurate passes
○ Coarse performed using compressed vectors in
memory
○ Accurate reranks coarse results using full resolution
vectors from disk
● Scales to 1B+ vectors
©2024 DataStax – All rights reserved.
DiskANN single layer design
25
©2024 DataStax – All rights reserved.
Non-blocking concurrency = linear scaling
26
©2024 DataStax – All rights reserved.
Product Quantization (PQ)
27
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved.
Product Quantization (PQ)
©2024 DataStax – All rights reserved.
Without reranking
30
©2024 DataStax – All rights reserved.
PQ with transparent reranking
31
©2024 DataStax – All rights reserved.
Binary Quantization is very lossy
32
©2024 DataStax – All rights reserved.
PQ is very, very hard to beat consistently
33
©2024 DataStax – All rights reserved.
LVQ: better than PQ at small(er) ratios
34
©2024 DataStax – All rights reserved.
DiskANN performance
35
● O(log N) coarse search
● O(topK) rerank
● Still O(N) memory use
©2024 DataStax – All rights reserved.
Beyond DiskANN
36
● 2023: 10M is a big vector index
● 2024: 1B is a big vector index
(and customers are asking when they can have 10B)
©2024 DataStax – All rights reserved.
Larger-than-memory index construction
37
● DiskANN (2019)
○ Split dataset into 40 partitions using kmeans
○ Index each partition separately, adding each node
to closest 2 partitions
○ Take the union of edges across all partitioned
indexes to make one big index
○ 5 days to build Deep1B dataset (350GB)
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved.
Larger-than-memory index construction
39
● JVector (2024)
○ Build the index using two-phase search (PQ in
memory, full resolution on disk)
○ 3h to build Cohere-v3-wikipedia (180GB)
©2024 DataStax – All rights reserved.
Reducing memory footprint from O(N) to O(1)
40
● Fused ADC
● First implemented by NGT (Yahoo! Japan, 2021)
● Apply Quicker ADC to graph indexes
○ PQ lookup tables stored on disk, not in memory
©2024 DataStax – All rights reserved.
Memory for 10M openai-v3-small vectors
41
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved. 42
Conclusion
©2024 DataStax – All rights reserved.
Milestones in ANN
43
● 2015: FastScan
● 2016: HNSW
● 2017: Quick ADC
● 2018: Quicker ADC
● 2019: DiskANN
● 2020: SCANN
● 2021: NGT QG
● 2022: SPANN
● 2023: LVQ
● 2024: JVector LTM
©2024 DataStax – All rights reserved.
Not like this
44
©2024 DataStax – All rights reserved.
What actually matters
45
Basic:
● Support for PQ
● Support for reranking
Advanced:
● Larger-than-memory index construction
● O(1) memory footprint for queries
● Support for LVQ
©2024 DataStax – All rights reserved.
Further reading
46
● DiskANN: Fast Accurate Billion-point Nearest
Neighbor Search on a Single Node
● Quicker ADC : Unlocking the hidden potential of
Product Quantization with SIMD
● Locally-Adaptive Quantization for Streaming Vector
Search

Vector Search @ sw2con for slideshare.pptx

  • 1.
    ©2024 DataStax –All rights reserved. Modern Vector Search SW2Con 2024
  • 2.
    ©2024 DataStax –All rights reserved.
  • 3.
    ©2024 DataStax –All rights reserved. magic [0.3025549650192261, 0.1912980079650879, 0.04950578138232231, 0.13541743159294128, 0.22033651173114777, 0.3047471046447754, 0.03519149497151375, 0.41724318265914917, 0.46010446548461914, 0.13088607788085938, 0.11903445422649384, 0.30909594893455505, 0.2992345690727234, 0.17327798902988434, 0.02294405922293663, 0.20794396102428436, 0.46378788352012634, 0.16246692836284637, 0.7109631896018982, 0.20986509323120117, 0.1922052949666977, ... 2048 Dimensions
  • 4.
    ©2024 DataStax –All rights reserved. K Nearest Neighbors search (KNN) 4
  • 5.
    ©2024 DataStax –All rights reserved.
  • 6.
    ©2024 DataStax –All rights reserved. The curse of dimensionality, KNN edition 6
  • 7.
    ©2024 DataStax –All rights reserved. The curse of dimensionality, KNN edition 7
  • 8.
    ©2024 DataStax –All rights reserved. The curse of dimensionality, KNN edition 8 3D: 0.1%
  • 9.
    ©2024 DataStax –All rights reserved. ©2024 DataStax – All rights reserved. 9 ANN (Approximate Nearest Neighbor)
  • 10.
    ©2024 DataStax –All rights reserved. Partitioned and graph indexes 10
  • 11.
    ©2024 DataStax –All rights reserved. Milestones in ANN 11 ● 2015: FastScan ● 2016: HNSW ● 2017: Quick ADC ● 2018: Quicker ADC ● 2019: DiskANN ● 2020: SCANN and APQ ● 2021: NGT QG ● 2022: SPANN ● 2023: LVQ ● 2024: JVector LTM Compression Graph construction Compression Compression Graph construction + compression Compression Compression Partitioning Compression Graph construction
  • 12.
    ©2024 DataStax –All rights reserved. IVF Partitioning 12
  • 13.
    ©2024 DataStax –All rights reserved. Search 13
  • 14.
    ©2024 DataStax –All rights reserved. KMeans 14
  • 15.
    ©2024 DataStax –All rights reserved. IVF search 15 ● Coarse: O(centroids) ● Accurate: O(M * N/centroids) ● Centroid count needs to be relatively high ○ FAISS recommends O(sqrt(N)) = 64K for N in 1..10M
  • 16.
    ©2024 DataStax –All rights reserved. SPANN (MSR, 2022) 16 ● Better partitioning ● Dynamic pruning during search ● Hybrid architecture: centroids in memory, postings lists on disk ● Scales to 1B+ vectors
  • 17.
    ©2024 DataStax –All rights reserved. Vector database index choices 17 ● Astra ● Lucene ● Milvus ● Pinecone ● Qdrant ● Weaviate ● pgVector Graph Graph Graph Graph Graph Graph Partitioned Graph
  • 18.
    ©2024 DataStax –All rights reserved. Partitioning downsides 18 ● KMeans is O(t*k*n*d) ● Incremental construction is difficult and slow ● Difficult to handle deletes ● SOTA is relatively complex
  • 19.
    ©2024 DataStax –All rights reserved. Graph indexes 19
  • 20.
    ©2024 DataStax –All rights reserved. HNSW (Malkov + Yashunin, 2016) 20 ● First modern graph index ● Still in use in e.g. Lucene ● Single-pass search, everything in memory
  • 21.
    ©2024 DataStax –All rights reserved. HNSW: diversity heuristic 21
  • 22.
    ©2024 DataStax –All rights reserved. Hierarchical NSW 22
  • 23.
    ©2024 DataStax –All rights reserved. Larger-than-memory HNSW 23
  • 24.
    ©2024 DataStax –All rights reserved. DiskANN (MSR, 2019) 24 ● Single graph layer ● Coarse + Accurate passes ○ Coarse performed using compressed vectors in memory ○ Accurate reranks coarse results using full resolution vectors from disk ● Scales to 1B+ vectors
  • 25.
    ©2024 DataStax –All rights reserved. DiskANN single layer design 25
  • 26.
    ©2024 DataStax –All rights reserved. Non-blocking concurrency = linear scaling 26
  • 27.
    ©2024 DataStax –All rights reserved. Product Quantization (PQ) 27
  • 28.
    ©2024 DataStax –All rights reserved.
  • 29.
    ©2024 DataStax –All rights reserved. Product Quantization (PQ)
  • 30.
    ©2024 DataStax –All rights reserved. Without reranking 30
  • 31.
    ©2024 DataStax –All rights reserved. PQ with transparent reranking 31
  • 32.
    ©2024 DataStax –All rights reserved. Binary Quantization is very lossy 32
  • 33.
    ©2024 DataStax –All rights reserved. PQ is very, very hard to beat consistently 33
  • 34.
    ©2024 DataStax –All rights reserved. LVQ: better than PQ at small(er) ratios 34
  • 35.
    ©2024 DataStax –All rights reserved. DiskANN performance 35 ● O(log N) coarse search ● O(topK) rerank ● Still O(N) memory use
  • 36.
    ©2024 DataStax –All rights reserved. Beyond DiskANN 36 ● 2023: 10M is a big vector index ● 2024: 1B is a big vector index (and customers are asking when they can have 10B)
  • 37.
    ©2024 DataStax –All rights reserved. Larger-than-memory index construction 37 ● DiskANN (2019) ○ Split dataset into 40 partitions using kmeans ○ Index each partition separately, adding each node to closest 2 partitions ○ Take the union of edges across all partitioned indexes to make one big index ○ 5 days to build Deep1B dataset (350GB)
  • 38.
    ©2024 DataStax –All rights reserved.
  • 39.
    ©2024 DataStax –All rights reserved. Larger-than-memory index construction 39 ● JVector (2024) ○ Build the index using two-phase search (PQ in memory, full resolution on disk) ○ 3h to build Cohere-v3-wikipedia (180GB)
  • 40.
    ©2024 DataStax –All rights reserved. Reducing memory footprint from O(N) to O(1) 40 ● Fused ADC ● First implemented by NGT (Yahoo! Japan, 2021) ● Apply Quicker ADC to graph indexes ○ PQ lookup tables stored on disk, not in memory
  • 41.
    ©2024 DataStax –All rights reserved. Memory for 10M openai-v3-small vectors 41
  • 42.
    ©2024 DataStax –All rights reserved. ©2024 DataStax – All rights reserved. 42 Conclusion
  • 43.
    ©2024 DataStax –All rights reserved. Milestones in ANN 43 ● 2015: FastScan ● 2016: HNSW ● 2017: Quick ADC ● 2018: Quicker ADC ● 2019: DiskANN ● 2020: SCANN ● 2021: NGT QG ● 2022: SPANN ● 2023: LVQ ● 2024: JVector LTM
  • 44.
    ©2024 DataStax –All rights reserved. Not like this 44
  • 45.
    ©2024 DataStax –All rights reserved. What actually matters 45 Basic: ● Support for PQ ● Support for reranking Advanced: ● Larger-than-memory index construction ● O(1) memory footprint for queries ● Support for LVQ
  • 46.
    ©2024 DataStax –All rights reserved. Further reading 46 ● DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node ● Quicker ADC : Unlocking the hidden potential of Product Quantization with SIMD ● Locally-Adaptive Quantization for Streaming Vector Search

Editor's Notes

  • #11 Two families
  • #16 (average case) So you can see how it’s important to pick the right centroid count for your dataset size HNSW for FAISS, SPTAG for SPANN
  • #31 It pisses me off that some projects are advising people to compress their vectors when they don’t support reranking
  • #32 Talk track: with 3x overquery we can beat uncompressed recall (and speed!) while compressing 64x For full context, see https://thenewstack.io/why-vector-size-matters/
  • #33 Less useful because even with overquery you can’t make up the recall loss (except for ada002) Openai-v3 works fine with BQ too but I’d rather compress it 64x with PQ
  • #34 Not quantitative!
  • #35 LVQ (Locally-adaptive Vector Quantization) is a new compression design that is accurate enough to be used in reranking. We can replace the full-resolution vectors on disk with LVQ-compressed, reducing index size by a factor of ~4 and speeding up queries about 20%
  • #44 My intro slide was titled “modern” vector search
  • #47 ADC is well known Quick ADC and Quicker ADC are more obscure and only used in slower, partitioned index designs Fused ADC is new in JVector and the first application to graph indexes