SlideShare a Scribd company logo
©2024 DataStax – All rights reserved.
Modern Vector Search
SW2Con 2024
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved.
magic
[0.3025549650192261,
0.1912980079650879,
0.04950578138232231,
0.13541743159294128,
0.22033651173114777,
0.3047471046447754,
0.03519149497151375,
0.41724318265914917,
0.46010446548461914,
0.13088607788085938,
0.11903445422649384,
0.30909594893455505,
0.2992345690727234,
0.17327798902988434,
0.02294405922293663,
0.20794396102428436,
0.46378788352012634,
0.16246692836284637,
0.7109631896018982,
0.20986509323120117,
0.1922052949666977,
...
2048
Dimensions
©2024 DataStax – All rights reserved.
K Nearest Neighbors search (KNN)
4
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved.
The curse of dimensionality, KNN edition
6
©2024 DataStax – All rights reserved.
The curse of dimensionality, KNN edition
7
©2024 DataStax – All rights reserved.
The curse of dimensionality, KNN edition
8
3D: 0.1%
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved. 9
ANN (Approximate Nearest Neighbor)
©2024 DataStax – All rights reserved.
Partitioned and graph indexes
10
©2024 DataStax – All rights reserved.
Milestones in ANN
11
● 2015: FastScan
● 2016: HNSW
● 2017: Quick ADC
● 2018: Quicker ADC
● 2019: DiskANN
● 2020: SCANN and APQ
● 2021: NGT QG
● 2022: SPANN
● 2023: LVQ
● 2024: JVector LTM
Compression
Graph construction
Compression
Compression
Graph construction + compression
Compression
Compression
Partitioning
Compression
Graph construction
©2024 DataStax – All rights reserved.
IVF Partitioning
12
©2024 DataStax – All rights reserved.
Search
13
©2024 DataStax – All rights reserved.
KMeans
14
©2024 DataStax – All rights reserved.
IVF search
15
● Coarse: O(centroids)
● Accurate: O(M * N/centroids)
● Centroid count needs to be relatively high
○ FAISS recommends O(sqrt(N)) = 64K for N in 1..10M
©2024 DataStax – All rights reserved.
SPANN (MSR, 2022)
16
● Better partitioning
● Dynamic pruning during search
● Hybrid architecture: centroids in memory, postings lists
on disk
● Scales to 1B+ vectors
©2024 DataStax – All rights reserved.
Vector database index choices
17
● Astra
● Lucene
● Milvus
● Pinecone
● Qdrant
● Weaviate
● pgVector
Graph
Graph
Graph
Graph
Graph
Graph
Partitioned Graph
©2024 DataStax – All rights reserved.
Partitioning downsides
18
● KMeans is O(t*k*n*d)
● Incremental construction is difficult and slow
● Difficult to handle deletes
● SOTA is relatively complex
©2024 DataStax – All rights reserved.
Graph indexes
19
©2024 DataStax – All rights reserved.
HNSW (Malkov + Yashunin, 2016)
20
● First modern graph index
● Still in use in e.g. Lucene
● Single-pass search, everything in memory
©2024 DataStax – All rights reserved.
HNSW: diversity heuristic
21
©2024 DataStax – All rights reserved.
Hierarchical NSW
22
©2024 DataStax – All rights reserved.
Larger-than-memory HNSW
23
©2024 DataStax – All rights reserved.
DiskANN (MSR, 2019)
24
● Single graph layer
● Coarse + Accurate passes
○ Coarse performed using compressed vectors in
memory
○ Accurate reranks coarse results using full resolution
vectors from disk
● Scales to 1B+ vectors
©2024 DataStax – All rights reserved.
DiskANN single layer design
25
©2024 DataStax – All rights reserved.
Non-blocking concurrency = linear scaling
26
©2024 DataStax – All rights reserved.
Product Quantization (PQ)
27
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved.
Product Quantization (PQ)
©2024 DataStax – All rights reserved.
Without reranking
30
©2024 DataStax – All rights reserved.
PQ with transparent reranking
31
©2024 DataStax – All rights reserved.
Binary Quantization is very lossy
32
©2024 DataStax – All rights reserved.
PQ is very, very hard to beat consistently
33
©2024 DataStax – All rights reserved.
LVQ: better than PQ at small(er) ratios
34
©2024 DataStax – All rights reserved.
DiskANN performance
35
● O(log N) coarse search
● O(topK) rerank
● Still O(N) memory use
©2024 DataStax – All rights reserved.
Beyond DiskANN
36
● 2023: 10M is a big vector index
● 2024: 1B is a big vector index
(and customers are asking when they can have 10B)
©2024 DataStax – All rights reserved.
Larger-than-memory index construction
37
● DiskANN (2019)
○ Split dataset into 40 partitions using kmeans
○ Index each partition separately, adding each node
to closest 2 partitions
○ Take the union of edges across all partitioned
indexes to make one big index
○ 5 days to build Deep1B dataset (350GB)
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved.
Larger-than-memory index construction
39
● JVector (2024)
○ Build the index using two-phase search (PQ in
memory, full resolution on disk)
○ 3h to build Cohere-v3-wikipedia (180GB)
©2024 DataStax – All rights reserved.
Reducing memory footprint from O(N) to O(1)
40
● Fused ADC
● First implemented by NGT (Yahoo! Japan, 2021)
● Apply Quicker ADC to graph indexes
○ PQ lookup tables stored on disk, not in memory
©2024 DataStax – All rights reserved.
Memory for 10M openai-v3-small vectors
41
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved. 42
Conclusion
©2024 DataStax – All rights reserved.
Milestones in ANN
43
● 2015: FastScan
● 2016: HNSW
● 2017: Quick ADC
● 2018: Quicker ADC
● 2019: DiskANN
● 2020: SCANN
● 2021: NGT QG
● 2022: SPANN
● 2023: LVQ
● 2024: JVector LTM
©2024 DataStax – All rights reserved.
Not like this
44
©2024 DataStax – All rights reserved.
What actually matters
45
Basic:
● Support for PQ
● Support for reranking
Advanced:
● Larger-than-memory index construction
● O(1) memory footprint for queries
● Support for LVQ
©2024 DataStax – All rights reserved.
Further reading
46
● DiskANN: Fast Accurate Billion-point Nearest
Neighbor Search on a Single Node
● Quicker ADC : Unlocking the hidden potential of
Product Quantization with SIMD
● Locally-Adaptive Quantization for Streaming Vector
Search

More Related Content

Similar to Vector Search @ sw2con for slideshare.pptx

3D webservices - where do we stand? (ENG)
3D webservices - where do we stand? (ENG)3D webservices - where do we stand? (ENG)
3D webservices - where do we stand? (ENG)
Camptocamp
 
Collection and Integration of Project Dara for Visualization and Analysis
Collection and Integration of Project Dara for Visualization and AnalysisCollection and Integration of Project Dara for Visualization and Analysis
Collection and Integration of Project Dara for Visualization and Analysis
Safe Software
 
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
Insight Technology, Inc.
 

Similar to Vector Search @ sw2con for slideshare.pptx (20)

3D webservices - where do we stand? (ENG)
3D webservices - where do we stand? (ENG)3D webservices - where do we stand? (ENG)
3D webservices - where do we stand? (ENG)
 
Bitmovin AV1/VVC Presentation_Streaming Media East by Christian Feldmann
Bitmovin AV1/VVC Presentation_Streaming Media East by Christian FeldmannBitmovin AV1/VVC Presentation_Streaming Media East by Christian Feldmann
Bitmovin AV1/VVC Presentation_Streaming Media East by Christian Feldmann
 
SolidWorks Simulation Premium - CAD MicroSolutions
SolidWorks Simulation Premium - CAD MicroSolutionsSolidWorks Simulation Premium - CAD MicroSolutions
SolidWorks Simulation Premium - CAD MicroSolutions
 
MTCNA Intro to routerOS
MTCNA Intro to routerOSMTCNA Intro to routerOS
MTCNA Intro to routerOS
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
 
Challenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache SparkChallenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache Spark
 
MTCNA : Intro to RouterOS - Part 1
MTCNA : Intro to RouterOS - Part 1MTCNA : Intro to RouterOS - Part 1
MTCNA : Intro to RouterOS - Part 1
 
Using druid for interactive count distinct queries at scale @ nmc
Using druid  for interactive count distinct queries at scale @ nmcUsing druid  for interactive count distinct queries at scale @ nmc
Using druid for interactive count distinct queries at scale @ nmc
 
Using druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scaleUsing druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scale
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DL
 
primitiv: Neural Network Toolkit
primitiv: Neural Network Toolkitprimitiv: Neural Network Toolkit
primitiv: Neural Network Toolkit
 
Jawg maurice vs google maps
Jawg   maurice vs google mapsJawg   maurice vs google maps
Jawg maurice vs google maps
 
PowerPoint Fusion360 - BIM OLE - English
PowerPoint Fusion360 - BIM OLE - EnglishPowerPoint Fusion360 - BIM OLE - English
PowerPoint Fusion360 - BIM OLE - English
 
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data ScienceScaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
 
Calculating Mass Properties of the 3D geometry. Connection between SolidWorks...
Calculating Mass Properties of the 3D geometry. Connection between SolidWorks...Calculating Mass Properties of the 3D geometry. Connection between SolidWorks...
Calculating Mass Properties of the 3D geometry. Connection between SolidWorks...
 
支援DSL的嵌入式圖形操作環境
支援DSL的嵌入式圖形操作環境支援DSL的嵌入式圖形操作環境
支援DSL的嵌入式圖形操作環境
 
Collection and Integration of Project Dara for Visualization and Analysis
Collection and Integration of Project Dara for Visualization and AnalysisCollection and Integration of Project Dara for Visualization and Analysis
Collection and Integration of Project Dara for Visualization and Analysis
 
2018 GIS in Government: Historical Topographic Map Collection
2018 GIS in Government: Historical Topographic Map Collection2018 GIS in Government: Historical Topographic Map Collection
2018 GIS in Government: Historical Topographic Map Collection
 
CAD/CAM/CIM ( Lecture 2 model construction and product design)
CAD/CAM/CIM ( Lecture 2 model construction and product design)CAD/CAM/CIM ( Lecture 2 model construction and product design)
CAD/CAM/CIM ( Lecture 2 model construction and product design)
 
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
 

More from jbellis

Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
jbellis
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014
jbellis
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
jbellis
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
jbellis
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
jbellis
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
jbellis
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solution
jbellis
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
jbellis
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
jbellis
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1
jbellis
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
jbellis
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
jbellis
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
jbellis
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011
jbellis
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
jbellis
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from java
jbellis
 

More from jbellis (20)

Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloud
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solution
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from java
 

Recently uploaded

Recently uploaded (20)

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Motion for AI: Creating Empathy in Technology
Motion for AI: Creating Empathy in TechnologyMotion for AI: Creating Empathy in Technology
Motion for AI: Creating Empathy in Technology
 
Server-Driven User Interface (SDUI) at Priceline
Server-Driven User Interface (SDUI) at PricelineServer-Driven User Interface (SDUI) at Priceline
Server-Driven User Interface (SDUI) at Priceline
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 

Vector Search @ sw2con for slideshare.pptx

  • 1. ©2024 DataStax – All rights reserved. Modern Vector Search SW2Con 2024
  • 2. ©2024 DataStax – All rights reserved.
  • 3. ©2024 DataStax – All rights reserved. magic [0.3025549650192261, 0.1912980079650879, 0.04950578138232231, 0.13541743159294128, 0.22033651173114777, 0.3047471046447754, 0.03519149497151375, 0.41724318265914917, 0.46010446548461914, 0.13088607788085938, 0.11903445422649384, 0.30909594893455505, 0.2992345690727234, 0.17327798902988434, 0.02294405922293663, 0.20794396102428436, 0.46378788352012634, 0.16246692836284637, 0.7109631896018982, 0.20986509323120117, 0.1922052949666977, ... 2048 Dimensions
  • 4. ©2024 DataStax – All rights reserved. K Nearest Neighbors search (KNN) 4
  • 5. ©2024 DataStax – All rights reserved.
  • 6. ©2024 DataStax – All rights reserved. The curse of dimensionality, KNN edition 6
  • 7. ©2024 DataStax – All rights reserved. The curse of dimensionality, KNN edition 7
  • 8. ©2024 DataStax – All rights reserved. The curse of dimensionality, KNN edition 8 3D: 0.1%
  • 9. ©2024 DataStax – All rights reserved. ©2024 DataStax – All rights reserved. 9 ANN (Approximate Nearest Neighbor)
  • 10. ©2024 DataStax – All rights reserved. Partitioned and graph indexes 10
  • 11. ©2024 DataStax – All rights reserved. Milestones in ANN 11 ● 2015: FastScan ● 2016: HNSW ● 2017: Quick ADC ● 2018: Quicker ADC ● 2019: DiskANN ● 2020: SCANN and APQ ● 2021: NGT QG ● 2022: SPANN ● 2023: LVQ ● 2024: JVector LTM Compression Graph construction Compression Compression Graph construction + compression Compression Compression Partitioning Compression Graph construction
  • 12. ©2024 DataStax – All rights reserved. IVF Partitioning 12
  • 13. ©2024 DataStax – All rights reserved. Search 13
  • 14. ©2024 DataStax – All rights reserved. KMeans 14
  • 15. ©2024 DataStax – All rights reserved. IVF search 15 ● Coarse: O(centroids) ● Accurate: O(M * N/centroids) ● Centroid count needs to be relatively high ○ FAISS recommends O(sqrt(N)) = 64K for N in 1..10M
  • 16. ©2024 DataStax – All rights reserved. SPANN (MSR, 2022) 16 ● Better partitioning ● Dynamic pruning during search ● Hybrid architecture: centroids in memory, postings lists on disk ● Scales to 1B+ vectors
  • 17. ©2024 DataStax – All rights reserved. Vector database index choices 17 ● Astra ● Lucene ● Milvus ● Pinecone ● Qdrant ● Weaviate ● pgVector Graph Graph Graph Graph Graph Graph Partitioned Graph
  • 18. ©2024 DataStax – All rights reserved. Partitioning downsides 18 ● KMeans is O(t*k*n*d) ● Incremental construction is difficult and slow ● Difficult to handle deletes ● SOTA is relatively complex
  • 19. ©2024 DataStax – All rights reserved. Graph indexes 19
  • 20. ©2024 DataStax – All rights reserved. HNSW (Malkov + Yashunin, 2016) 20 ● First modern graph index ● Still in use in e.g. Lucene ● Single-pass search, everything in memory
  • 21. ©2024 DataStax – All rights reserved. HNSW: diversity heuristic 21
  • 22. ©2024 DataStax – All rights reserved. Hierarchical NSW 22
  • 23. ©2024 DataStax – All rights reserved. Larger-than-memory HNSW 23
  • 24. ©2024 DataStax – All rights reserved. DiskANN (MSR, 2019) 24 ● Single graph layer ● Coarse + Accurate passes ○ Coarse performed using compressed vectors in memory ○ Accurate reranks coarse results using full resolution vectors from disk ● Scales to 1B+ vectors
  • 25. ©2024 DataStax – All rights reserved. DiskANN single layer design 25
  • 26. ©2024 DataStax – All rights reserved. Non-blocking concurrency = linear scaling 26
  • 27. ©2024 DataStax – All rights reserved. Product Quantization (PQ) 27
  • 28. ©2024 DataStax – All rights reserved.
  • 29. ©2024 DataStax – All rights reserved. Product Quantization (PQ)
  • 30. ©2024 DataStax – All rights reserved. Without reranking 30
  • 31. ©2024 DataStax – All rights reserved. PQ with transparent reranking 31
  • 32. ©2024 DataStax – All rights reserved. Binary Quantization is very lossy 32
  • 33. ©2024 DataStax – All rights reserved. PQ is very, very hard to beat consistently 33
  • 34. ©2024 DataStax – All rights reserved. LVQ: better than PQ at small(er) ratios 34
  • 35. ©2024 DataStax – All rights reserved. DiskANN performance 35 ● O(log N) coarse search ● O(topK) rerank ● Still O(N) memory use
  • 36. ©2024 DataStax – All rights reserved. Beyond DiskANN 36 ● 2023: 10M is a big vector index ● 2024: 1B is a big vector index (and customers are asking when they can have 10B)
  • 37. ©2024 DataStax – All rights reserved. Larger-than-memory index construction 37 ● DiskANN (2019) ○ Split dataset into 40 partitions using kmeans ○ Index each partition separately, adding each node to closest 2 partitions ○ Take the union of edges across all partitioned indexes to make one big index ○ 5 days to build Deep1B dataset (350GB)
  • 38. ©2024 DataStax – All rights reserved.
  • 39. ©2024 DataStax – All rights reserved. Larger-than-memory index construction 39 ● JVector (2024) ○ Build the index using two-phase search (PQ in memory, full resolution on disk) ○ 3h to build Cohere-v3-wikipedia (180GB)
  • 40. ©2024 DataStax – All rights reserved. Reducing memory footprint from O(N) to O(1) 40 ● Fused ADC ● First implemented by NGT (Yahoo! Japan, 2021) ● Apply Quicker ADC to graph indexes ○ PQ lookup tables stored on disk, not in memory
  • 41. ©2024 DataStax – All rights reserved. Memory for 10M openai-v3-small vectors 41
  • 42. ©2024 DataStax – All rights reserved. ©2024 DataStax – All rights reserved. 42 Conclusion
  • 43. ©2024 DataStax – All rights reserved. Milestones in ANN 43 ● 2015: FastScan ● 2016: HNSW ● 2017: Quick ADC ● 2018: Quicker ADC ● 2019: DiskANN ● 2020: SCANN ● 2021: NGT QG ● 2022: SPANN ● 2023: LVQ ● 2024: JVector LTM
  • 44. ©2024 DataStax – All rights reserved. Not like this 44
  • 45. ©2024 DataStax – All rights reserved. What actually matters 45 Basic: ● Support for PQ ● Support for reranking Advanced: ● Larger-than-memory index construction ● O(1) memory footprint for queries ● Support for LVQ
  • 46. ©2024 DataStax – All rights reserved. Further reading 46 ● DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node ● Quicker ADC : Unlocking the hidden potential of Product Quantization with SIMD ● Locally-Adaptive Quantization for Streaming Vector Search

Editor's Notes

  1. Two families
  2. (average case) So you can see how it’s important to pick the right centroid count for your dataset size HNSW for FAISS, SPTAG for SPANN
  3. It pisses me off that some projects are advising people to compress their vectors when they don’t support reranking
  4. Talk track: with 3x overquery we can beat uncompressed recall (and speed!) while compressing 64x For full context, see https://thenewstack.io/why-vector-size-matters/
  5. Less useful because even with overquery you can’t make up the recall loss (except for ada002) Openai-v3 works fine with BQ too but I’d rather compress it 64x with PQ
  6. Not quantitative!
  7. LVQ (Locally-adaptive Vector Quantization) is a new compression design that is accurate enough to be used in reranking. We can replace the full-resolution vectors on disk with LVQ-compressed, reducing index size by a factor of ~4 and speeding up queries about 20%
  8. My intro slide was titled “modern” vector search
  9. ADC is well known Quick ADC and Quicker ADC are more obscure and only used in slower, partitioned index designs Fused ADC is new in JVector and the first application to graph indexes