Vector Search @ sw2con for slideshare.pptx

©2024 DataStax – All rights reserved.
Modern Vector Search
SW2Con 2024

magic
[0.3025549650192261,
0.1912980079650879,
0.04950578138232231,
0.13541743159294128,
0.22033651173114777,
0.3047471046447754,
0.03519149497151375,
0.41724318265914917,
0.46010446548461914,
0.13088607788085938,
0.11903445422649384,
0.30909594893455505,
0.2992345690727234,
0.17327798902988434,
0.02294405922293663,
0.20794396102428436,
0.46378788352012634,
0.16246692836284637,
0.7109631896018982,
0.20986509323120117,
0.1922052949666977,
...
2048
Dimensions

K Nearest Neighbors search (KNN)
4

The curse of dimensionality, KNN edition
6

7

8
3D: 0.1%

©2024 DataStax – All rights reserved. 9
ANN (Approximate Nearest Neighbor)

Partitioned and graph indexes
10

Milestones in ANN
11
● 2015: FastScan
● 2016: HNSW
● 2017: Quick ADC
● 2018: Quicker ADC
● 2019: DiskANN
● 2020: SCANN and APQ
● 2021: NGT QG
● 2022: SPANN
● 2023: LVQ
● 2024: JVector LTM
Compression
Graph construction
Compression
Compression
Graph construction + compression
Compression
Compression
Partitioning
Compression
Graph construction

IVF Partitioning
12

Search
13

KMeans
14

IVF search
15
● Coarse: O(centroids)
● Accurate: O(M * N/centroids)
● Centroid count needs to be relatively high
○ FAISS recommends O(sqrt(N)) = 64K for N in 1..10M

SPANN (MSR, 2022)
16
● Better partitioning
● Dynamic pruning during search
● Hybrid architecture: centroids in memory, postings lists
on disk
● Scales to 1B+ vectors

Vector database index choices
17
● Astra
● Lucene
● Milvus
● Pinecone
● Qdrant
● Weaviate
● pgVector
Graph
Graph
Graph
Graph
Graph
Graph
Partitioned Graph

Partitioning downsides
18
● KMeans is O(t*k*n*d)
● Incremental construction is difficult and slow
● Difficult to handle deletes
● SOTA is relatively complex

Graph indexes
19

HNSW (Malkov + Yashunin, 2016)
20
● First modern graph index
● Still in use in e.g. Lucene
● Single-pass search, everything in memory

HNSW: diversity heuristic
21

Hierarchical NSW
22

Larger-than-memory HNSW
23

DiskANN (MSR, 2019)
24
● Single graph layer
● Coarse + Accurate passes
○ Coarse performed using compressed vectors in
memory
○ Accurate reranks coarse results using full resolution
vectors from disk
● Scales to 1B+ vectors

DiskANN single layer design
25

Non-blocking concurrency = linear scaling
26

Product Quantization (PQ)
27

Product Quantization (PQ)

Without reranking
30

PQ with transparent reranking
31

Binary Quantization is very lossy
32

PQ is very, very hard to beat consistently
33

LVQ: better than PQ at small(er) ratios
34

DiskANN performance
35
● O(log N) coarse search
● O(topK) rerank
● Still O(N) memory use

Beyond DiskANN
36
● 2023: 10M is a big vector index
● 2024: 1B is a big vector index
(and customers are asking when they can have 10B)

Larger-than-memory index construction
37
● DiskANN (2019)
○ Split dataset into 40 partitions using kmeans
○ Index each partition separately, adding each node
to closest 2 partitions
○ Take the union of edges across all partitioned
indexes to make one big index
○ 5 days to build Deep1B dataset (350GB)

Larger-than-memory index construction
39
● JVector (2024)
○ Build the index using two-phase search (PQ in
memory, full resolution on disk)
○ 3h to build Cohere-v3-wikipedia (180GB)

Reducing memory footprint from O(N) to O(1)
40
● Fused ADC
● First implemented by NGT (Yahoo! Japan, 2021)
● Apply Quicker ADC to graph indexes
○ PQ lookup tables stored on disk, not in memory

Memory for 10M openai-v3-small vectors
41

Milestones in ANN
43
● 2015: FastScan
● 2016: HNSW
● 2017: Quick ADC
● 2018: Quicker ADC
● 2019: DiskANN
● 2020: SCANN
● 2021: NGT QG
● 2022: SPANN
● 2023: LVQ
● 2024: JVector LTM

Not like this
44

What actually matters
45
Basic:
● Support for PQ
● Support for reranking
Advanced:
● Larger-than-memory index construction
● O(1) memory footprint for queries
● Support for LVQ

Further reading
46
● DiskANN: Fast Accurate Billion-point Nearest
Neighbor Search on a Single Node
● Quicker ADC : Unlocking the hidden potential of
Product Quantization with SIMD
● Locally-Adaptive Quantization for Streaming Vector
Search

Vector Search @ sw2con for slideshare.pptx

Recommended

Recommended

More Related Content

Similar to Vector Search @ sw2con for slideshare.pptx

Similar to Vector Search @ sw2con for slideshare.pptx (20)

More from jbellis

More from jbellis (20)

Recently uploaded

Recently uploaded (20)

Vector Search @ sw2con for slideshare.pptx

Editor's Notes