SlideShare a Scribd company logo
©2024 DataStax – All rights reserved.
Modern Vector Search
SW2Con 2024
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved.
magic
[0.3025549650192261,
0.1912980079650879,
0.04950578138232231,
0.13541743159294128,
0.22033651173114777,
0.3047471046447754,
0.03519149497151375,
0.41724318265914917,
0.46010446548461914,
0.13088607788085938,
0.11903445422649384,
0.30909594893455505,
0.2992345690727234,
0.17327798902988434,
0.02294405922293663,
0.20794396102428436,
0.46378788352012634,
0.16246692836284637,
0.7109631896018982,
0.20986509323120117,
0.1922052949666977,
...
2048
Dimensions
©2024 DataStax – All rights reserved.
K Nearest Neighbors search (KNN)
4
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved.
The curse of dimensionality, KNN edition
6
©2024 DataStax – All rights reserved.
The curse of dimensionality, KNN edition
7
©2024 DataStax – All rights reserved.
The curse of dimensionality, KNN edition
8
3D: 0.1%
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved. 9
ANN (Approximate Nearest Neighbor)
©2024 DataStax – All rights reserved.
Partitioned and graph indexes
10
©2024 DataStax – All rights reserved.
Milestones in ANN
11
● 2015: FastScan
● 2016: HNSW
● 2017: Quick ADC
● 2018: Quicker ADC
● 2019: DiskANN
● 2020: SCANN and APQ
● 2021: NGT QG
● 2022: SPANN
● 2023: LVQ
● 2024: JVector LTM
Compression
Graph construction
Compression
Compression
Graph construction + compression
Compression
Compression
Partitioning
Compression
Graph construction
©2024 DataStax – All rights reserved.
IVF Partitioning
12
©2024 DataStax – All rights reserved.
Search
13
©2024 DataStax – All rights reserved.
KMeans
14
©2024 DataStax – All rights reserved.
IVF search
15
● Coarse: O(centroids)
● Accurate: O(M * N/centroids)
● Centroid count needs to be relatively high
○ FAISS recommends O(sqrt(N)) = 64K for N in 1..10M
©2024 DataStax – All rights reserved.
SPANN (MSR, 2022)
16
● Better partitioning
● Dynamic pruning during search
● Hybrid architecture: centroids in memory, postings lists
on disk
● Scales to 1B+ vectors
©2024 DataStax – All rights reserved.
Vector database index choices
17
● Astra
● Lucene
● Milvus
● Pinecone
● Qdrant
● Weaviate
● pgVector
Graph
Graph
Graph
Graph
Graph
Graph
Partitioned Graph
©2024 DataStax – All rights reserved.
Partitioning downsides
18
● KMeans is O(t*k*n*d)
● Incremental construction is difficult and slow
● Difficult to handle deletes
● SOTA is relatively complex
©2024 DataStax – All rights reserved.
Graph indexes
19
©2024 DataStax – All rights reserved.
HNSW (Malkov + Yashunin, 2016)
20
● First modern graph index
● Still in use in e.g. Lucene
● Single-pass search, everything in memory
©2024 DataStax – All rights reserved.
HNSW: diversity heuristic
21
©2024 DataStax – All rights reserved.
Hierarchical NSW
22
©2024 DataStax – All rights reserved.
Larger-than-memory HNSW
23
©2024 DataStax – All rights reserved.
DiskANN (MSR, 2019)
24
● Single graph layer
● Coarse + Accurate passes
○ Coarse performed using compressed vectors in
memory
○ Accurate reranks coarse results using full resolution
vectors from disk
● Scales to 1B+ vectors
©2024 DataStax – All rights reserved.
DiskANN single layer design
25
©2024 DataStax – All rights reserved.
Non-blocking concurrency = linear scaling
26
©2024 DataStax – All rights reserved.
Product Quantization (PQ)
27
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved.
Product Quantization (PQ)
©2024 DataStax – All rights reserved.
Without reranking
30
©2024 DataStax – All rights reserved.
PQ with transparent reranking
31
©2024 DataStax – All rights reserved.
Binary Quantization is very lossy
32
©2024 DataStax – All rights reserved.
PQ is very, very hard to beat consistently
33
©2024 DataStax – All rights reserved.
LVQ: better than PQ at small(er) ratios
34
©2024 DataStax – All rights reserved.
DiskANN performance
35
● O(log N) coarse search
● O(topK) rerank
● Still O(N) memory use
©2024 DataStax – All rights reserved.
Beyond DiskANN
36
● 2023: 10M is a big vector index
● 2024: 1B is a big vector index
(and customers are asking when they can have 10B)
©2024 DataStax – All rights reserved.
Larger-than-memory index construction
37
● DiskANN (2019)
○ Split dataset into 40 partitions using kmeans
○ Index each partition separately, adding each node
to closest 2 partitions
○ Take the union of edges across all partitioned
indexes to make one big index
○ 5 days to build Deep1B dataset (350GB)
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved.
Larger-than-memory index construction
39
● JVector (2024)
○ Build the index using two-phase search (PQ in
memory, full resolution on disk)
○ 3h to build Cohere-v3-wikipedia (180GB)
©2024 DataStax – All rights reserved.
Reducing memory footprint from O(N) to O(1)
40
● Fused ADC
● First implemented by NGT (Yahoo! Japan, 2021)
● Apply Quicker ADC to graph indexes
○ PQ lookup tables stored on disk, not in memory
©2024 DataStax – All rights reserved.
Memory for 10M openai-v3-small vectors
41
©2024 DataStax – All rights reserved.
©2024 DataStax – All rights reserved. 42
Conclusion
©2024 DataStax – All rights reserved.
Milestones in ANN
43
● 2015: FastScan
● 2016: HNSW
● 2017: Quick ADC
● 2018: Quicker ADC
● 2019: DiskANN
● 2020: SCANN
● 2021: NGT QG
● 2022: SPANN
● 2023: LVQ
● 2024: JVector LTM
©2024 DataStax – All rights reserved.
Not like this
44
©2024 DataStax – All rights reserved.
What actually matters
45
Basic:
● Support for PQ
● Support for reranking
Advanced:
● Larger-than-memory index construction
● O(1) memory footprint for queries
● Support for LVQ
©2024 DataStax – All rights reserved.
Further reading
46
● DiskANN: Fast Accurate Billion-point Nearest
Neighbor Search on a Single Node
● Quicker ADC : Unlocking the hidden potential of
Product Quantization with SIMD
● Locally-Adaptive Quantization for Streaming Vector
Search

More Related Content

Similar to Vector Search @ sw2con for slideshare.pptx

3D webservices - where do we stand? (ENG)
3D webservices - where do we stand? (ENG)3D webservices - where do we stand? (ENG)
3D webservices - where do we stand? (ENG)
Camptocamp
 
Bitmovin AV1/VVC Presentation_Streaming Media East by Christian Feldmann
Bitmovin AV1/VVC Presentation_Streaming Media East by Christian FeldmannBitmovin AV1/VVC Presentation_Streaming Media East by Christian Feldmann
Bitmovin AV1/VVC Presentation_Streaming Media East by Christian Feldmann
Bitmovin Inc
 
SolidWorks Simulation Premium - CAD MicroSolutions
SolidWorks Simulation Premium - CAD MicroSolutionsSolidWorks Simulation Premium - CAD MicroSolutions
SolidWorks Simulation Premium - CAD MicroSolutions
Cad MicroSolutions Inc.
 
MTCNA Intro to routerOS
MTCNA Intro to routerOSMTCNA Intro to routerOS
MTCNA Intro to routerOS
GLC Networks
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Databricks
 
Challenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache SparkChallenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache Spark
Databricks
 
MTCNA : Intro to RouterOS - Part 1
MTCNA : Intro to RouterOS - Part 1MTCNA : Intro to RouterOS - Part 1
MTCNA : Intro to RouterOS - Part 1
GLC Networks
 
Using druid for interactive count distinct queries at scale @ nmc
Using druid  for interactive count distinct queries at scale @ nmcUsing druid  for interactive count distinct queries at scale @ nmc
Using druid for interactive count distinct queries at scale @ nmc
Ido Shilon
 
Using druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scaleUsing druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scale
Itai Yaffe
 
primitiv: Neural Network Toolkit
primitiv: Neural Network Toolkitprimitiv: Neural Network Toolkit
primitiv: Neural Network Toolkit
Yusuke Oda
 
Jawg maurice vs google maps
Jawg   maurice vs google mapsJawg   maurice vs google maps
Jawg maurice vs google maps
Loic Ortola
 
PowerPoint Fusion360 - BIM OLE - English
PowerPoint Fusion360 - BIM OLE - EnglishPowerPoint Fusion360 - BIM OLE - English
PowerPoint Fusion360 - BIM OLE - English
Quentin Marquette
 
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data ScienceScaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Neo4j
 
Calculating Mass Properties of the 3D geometry. Connection between SolidWorks...
Calculating Mass Properties of the 3D geometry. Connection between SolidWorks...Calculating Mass Properties of the 3D geometry. Connection between SolidWorks...
Calculating Mass Properties of the 3D geometry. Connection between SolidWorks...
FIDE Master Tihomir Dovramadjiev PhD
 
支援DSL的嵌入式圖形操作環境
支援DSL的嵌入式圖形操作環境支援DSL的嵌入式圖形操作環境
支援DSL的嵌入式圖形操作環境
John Chou
 
Collection and Integration of Project Dara for Visualization and Analysis
Collection and Integration of Project Dara for Visualization and AnalysisCollection and Integration of Project Dara for Visualization and Analysis
Collection and Integration of Project Dara for Visualization and Analysis
Safe Software
 
2018 GIS in Government: Historical Topographic Map Collection
2018 GIS in Government: Historical Topographic Map Collection2018 GIS in Government: Historical Topographic Map Collection
2018 GIS in Government: Historical Topographic Map Collection
GIS in the Rockies
 
CAD/CAM/CIM ( Lecture 2 model construction and product design)
CAD/CAM/CIM ( Lecture 2 model construction and product design)CAD/CAM/CIM ( Lecture 2 model construction and product design)
CAD/CAM/CIM ( Lecture 2 model construction and product design)
Amanuel Diriba From Jimma Institute of Technology
 
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
Insight Technology, Inc.
 
SOLIDWORKS World 2016 - Sketching Tips From A SOLIDWORKS Guru
SOLIDWORKS World 2016 - Sketching Tips From A SOLIDWORKS GuruSOLIDWORKS World 2016 - Sketching Tips From A SOLIDWORKS Guru
SOLIDWORKS World 2016 - Sketching Tips From A SOLIDWORKS Guru
CAPINC
 

Similar to Vector Search @ sw2con for slideshare.pptx (20)

3D webservices - where do we stand? (ENG)
3D webservices - where do we stand? (ENG)3D webservices - where do we stand? (ENG)
3D webservices - where do we stand? (ENG)
 
Bitmovin AV1/VVC Presentation_Streaming Media East by Christian Feldmann
Bitmovin AV1/VVC Presentation_Streaming Media East by Christian FeldmannBitmovin AV1/VVC Presentation_Streaming Media East by Christian Feldmann
Bitmovin AV1/VVC Presentation_Streaming Media East by Christian Feldmann
 
SolidWorks Simulation Premium - CAD MicroSolutions
SolidWorks Simulation Premium - CAD MicroSolutionsSolidWorks Simulation Premium - CAD MicroSolutions
SolidWorks Simulation Premium - CAD MicroSolutions
 
MTCNA Intro to routerOS
MTCNA Intro to routerOSMTCNA Intro to routerOS
MTCNA Intro to routerOS
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
 
Challenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache SparkChallenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache Spark
 
MTCNA : Intro to RouterOS - Part 1
MTCNA : Intro to RouterOS - Part 1MTCNA : Intro to RouterOS - Part 1
MTCNA : Intro to RouterOS - Part 1
 
Using druid for interactive count distinct queries at scale @ nmc
Using druid  for interactive count distinct queries at scale @ nmcUsing druid  for interactive count distinct queries at scale @ nmc
Using druid for interactive count distinct queries at scale @ nmc
 
Using druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scaleUsing druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scale
 
primitiv: Neural Network Toolkit
primitiv: Neural Network Toolkitprimitiv: Neural Network Toolkit
primitiv: Neural Network Toolkit
 
Jawg maurice vs google maps
Jawg   maurice vs google mapsJawg   maurice vs google maps
Jawg maurice vs google maps
 
PowerPoint Fusion360 - BIM OLE - English
PowerPoint Fusion360 - BIM OLE - EnglishPowerPoint Fusion360 - BIM OLE - English
PowerPoint Fusion360 - BIM OLE - English
 
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data ScienceScaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
 
Calculating Mass Properties of the 3D geometry. Connection between SolidWorks...
Calculating Mass Properties of the 3D geometry. Connection between SolidWorks...Calculating Mass Properties of the 3D geometry. Connection between SolidWorks...
Calculating Mass Properties of the 3D geometry. Connection between SolidWorks...
 
支援DSL的嵌入式圖形操作環境
支援DSL的嵌入式圖形操作環境支援DSL的嵌入式圖形操作環境
支援DSL的嵌入式圖形操作環境
 
Collection and Integration of Project Dara for Visualization and Analysis
Collection and Integration of Project Dara for Visualization and AnalysisCollection and Integration of Project Dara for Visualization and Analysis
Collection and Integration of Project Dara for Visualization and Analysis
 
2018 GIS in Government: Historical Topographic Map Collection
2018 GIS in Government: Historical Topographic Map Collection2018 GIS in Government: Historical Topographic Map Collection
2018 GIS in Government: Historical Topographic Map Collection
 
CAD/CAM/CIM ( Lecture 2 model construction and product design)
CAD/CAM/CIM ( Lecture 2 model construction and product design)CAD/CAM/CIM ( Lecture 2 model construction and product design)
CAD/CAM/CIM ( Lecture 2 model construction and product design)
 
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
[D15] 最強にスケーラブルなカラムナーDBよ、Hadoopとのタッグでビッグデータの地平を目指せ!by Daisuke Hirama
 
SOLIDWORKS World 2016 - Sketching Tips From A SOLIDWORKS Guru
SOLIDWORKS World 2016 - Sketching Tips From A SOLIDWORKS GuruSOLIDWORKS World 2016 - Sketching Tips From A SOLIDWORKS Guru
SOLIDWORKS World 2016 - Sketching Tips From A SOLIDWORKS Guru
 

More from jbellis

Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
jbellis
 
Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloud
jbellis
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015
jbellis
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
jbellis
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
jbellis
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014
jbellis
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
jbellis
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
jbellis
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
jbellis
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
jbellis
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solution
jbellis
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
jbellis
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
jbellis
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1
jbellis
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
jbellis
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
jbellis
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
jbellis
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011
jbellis
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
jbellis
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from java
jbellis
 

More from jbellis (20)

Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloud
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solution
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from java
 

Recently uploaded

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 

Vector Search @ sw2con for slideshare.pptx

  • 1. ©2024 DataStax – All rights reserved. Modern Vector Search SW2Con 2024
  • 2. ©2024 DataStax – All rights reserved.
  • 3. ©2024 DataStax – All rights reserved. magic [0.3025549650192261, 0.1912980079650879, 0.04950578138232231, 0.13541743159294128, 0.22033651173114777, 0.3047471046447754, 0.03519149497151375, 0.41724318265914917, 0.46010446548461914, 0.13088607788085938, 0.11903445422649384, 0.30909594893455505, 0.2992345690727234, 0.17327798902988434, 0.02294405922293663, 0.20794396102428436, 0.46378788352012634, 0.16246692836284637, 0.7109631896018982, 0.20986509323120117, 0.1922052949666977, ... 2048 Dimensions
  • 4. ©2024 DataStax – All rights reserved. K Nearest Neighbors search (KNN) 4
  • 5. ©2024 DataStax – All rights reserved.
  • 6. ©2024 DataStax – All rights reserved. The curse of dimensionality, KNN edition 6
  • 7. ©2024 DataStax – All rights reserved. The curse of dimensionality, KNN edition 7
  • 8. ©2024 DataStax – All rights reserved. The curse of dimensionality, KNN edition 8 3D: 0.1%
  • 9. ©2024 DataStax – All rights reserved. ©2024 DataStax – All rights reserved. 9 ANN (Approximate Nearest Neighbor)
  • 10. ©2024 DataStax – All rights reserved. Partitioned and graph indexes 10
  • 11. ©2024 DataStax – All rights reserved. Milestones in ANN 11 ● 2015: FastScan ● 2016: HNSW ● 2017: Quick ADC ● 2018: Quicker ADC ● 2019: DiskANN ● 2020: SCANN and APQ ● 2021: NGT QG ● 2022: SPANN ● 2023: LVQ ● 2024: JVector LTM Compression Graph construction Compression Compression Graph construction + compression Compression Compression Partitioning Compression Graph construction
  • 12. ©2024 DataStax – All rights reserved. IVF Partitioning 12
  • 13. ©2024 DataStax – All rights reserved. Search 13
  • 14. ©2024 DataStax – All rights reserved. KMeans 14
  • 15. ©2024 DataStax – All rights reserved. IVF search 15 ● Coarse: O(centroids) ● Accurate: O(M * N/centroids) ● Centroid count needs to be relatively high ○ FAISS recommends O(sqrt(N)) = 64K for N in 1..10M
  • 16. ©2024 DataStax – All rights reserved. SPANN (MSR, 2022) 16 ● Better partitioning ● Dynamic pruning during search ● Hybrid architecture: centroids in memory, postings lists on disk ● Scales to 1B+ vectors
  • 17. ©2024 DataStax – All rights reserved. Vector database index choices 17 ● Astra ● Lucene ● Milvus ● Pinecone ● Qdrant ● Weaviate ● pgVector Graph Graph Graph Graph Graph Graph Partitioned Graph
  • 18. ©2024 DataStax – All rights reserved. Partitioning downsides 18 ● KMeans is O(t*k*n*d) ● Incremental construction is difficult and slow ● Difficult to handle deletes ● SOTA is relatively complex
  • 19. ©2024 DataStax – All rights reserved. Graph indexes 19
  • 20. ©2024 DataStax – All rights reserved. HNSW (Malkov + Yashunin, 2016) 20 ● First modern graph index ● Still in use in e.g. Lucene ● Single-pass search, everything in memory
  • 21. ©2024 DataStax – All rights reserved. HNSW: diversity heuristic 21
  • 22. ©2024 DataStax – All rights reserved. Hierarchical NSW 22
  • 23. ©2024 DataStax – All rights reserved. Larger-than-memory HNSW 23
  • 24. ©2024 DataStax – All rights reserved. DiskANN (MSR, 2019) 24 ● Single graph layer ● Coarse + Accurate passes ○ Coarse performed using compressed vectors in memory ○ Accurate reranks coarse results using full resolution vectors from disk ● Scales to 1B+ vectors
  • 25. ©2024 DataStax – All rights reserved. DiskANN single layer design 25
  • 26. ©2024 DataStax – All rights reserved. Non-blocking concurrency = linear scaling 26
  • 27. ©2024 DataStax – All rights reserved. Product Quantization (PQ) 27
  • 28. ©2024 DataStax – All rights reserved.
  • 29. ©2024 DataStax – All rights reserved. Product Quantization (PQ)
  • 30. ©2024 DataStax – All rights reserved. Without reranking 30
  • 31. ©2024 DataStax – All rights reserved. PQ with transparent reranking 31
  • 32. ©2024 DataStax – All rights reserved. Binary Quantization is very lossy 32
  • 33. ©2024 DataStax – All rights reserved. PQ is very, very hard to beat consistently 33
  • 34. ©2024 DataStax – All rights reserved. LVQ: better than PQ at small(er) ratios 34
  • 35. ©2024 DataStax – All rights reserved. DiskANN performance 35 ● O(log N) coarse search ● O(topK) rerank ● Still O(N) memory use
  • 36. ©2024 DataStax – All rights reserved. Beyond DiskANN 36 ● 2023: 10M is a big vector index ● 2024: 1B is a big vector index (and customers are asking when they can have 10B)
  • 37. ©2024 DataStax – All rights reserved. Larger-than-memory index construction 37 ● DiskANN (2019) ○ Split dataset into 40 partitions using kmeans ○ Index each partition separately, adding each node to closest 2 partitions ○ Take the union of edges across all partitioned indexes to make one big index ○ 5 days to build Deep1B dataset (350GB)
  • 38. ©2024 DataStax – All rights reserved.
  • 39. ©2024 DataStax – All rights reserved. Larger-than-memory index construction 39 ● JVector (2024) ○ Build the index using two-phase search (PQ in memory, full resolution on disk) ○ 3h to build Cohere-v3-wikipedia (180GB)
  • 40. ©2024 DataStax – All rights reserved. Reducing memory footprint from O(N) to O(1) 40 ● Fused ADC ● First implemented by NGT (Yahoo! Japan, 2021) ● Apply Quicker ADC to graph indexes ○ PQ lookup tables stored on disk, not in memory
  • 41. ©2024 DataStax – All rights reserved. Memory for 10M openai-v3-small vectors 41
  • 42. ©2024 DataStax – All rights reserved. ©2024 DataStax – All rights reserved. 42 Conclusion
  • 43. ©2024 DataStax – All rights reserved. Milestones in ANN 43 ● 2015: FastScan ● 2016: HNSW ● 2017: Quick ADC ● 2018: Quicker ADC ● 2019: DiskANN ● 2020: SCANN ● 2021: NGT QG ● 2022: SPANN ● 2023: LVQ ● 2024: JVector LTM
  • 44. ©2024 DataStax – All rights reserved. Not like this 44
  • 45. ©2024 DataStax – All rights reserved. What actually matters 45 Basic: ● Support for PQ ● Support for reranking Advanced: ● Larger-than-memory index construction ● O(1) memory footprint for queries ● Support for LVQ
  • 46. ©2024 DataStax – All rights reserved. Further reading 46 ● DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node ● Quicker ADC : Unlocking the hidden potential of Product Quantization with SIMD ● Locally-Adaptive Quantization for Streaming Vector Search

Editor's Notes

  1. Two families
  2. (average case) So you can see how it’s important to pick the right centroid count for your dataset size HNSW for FAISS, SPTAG for SPANN
  3. It pisses me off that some projects are advising people to compress their vectors when they don’t support reranking
  4. Talk track: with 3x overquery we can beat uncompressed recall (and speed!) while compressing 64x For full context, see https://thenewstack.io/why-vector-size-matters/
  5. Less useful because even with overquery you can’t make up the recall loss (except for ada002) Openai-v3 works fine with BQ too but I’d rather compress it 64x with PQ
  6. Not quantitative!
  7. LVQ (Locally-adaptive Vector Quantization) is a new compression design that is accurate enough to be used in reranking. We can replace the full-resolution vectors on disk with LVQ-compressed, reducing index size by a factor of ~4 and speeding up queries about 20%
  8. My intro slide was titled “modern” vector search
  9. ADC is well known Quick ADC and Quicker ADC are more obscure and only used in slower, partitioned index designs Fused ADC is new in JVector and the first application to graph indexes