SlideShare a Scribd company logo
1 of 16
Download to read offline
Performance Optimize 101
James Luan
2024.3.19
Key factors for Scalable and Performant Vector Search
Speaker Bio
James
VP of Engineering @ Zilliz
Milvus Maintainer, chief architect for Milvus 2.0.
Former employee at Oracle and Alibaba, seasoned
open-source developer.
People say vector search is easy
Finish in 10 lines of code
No need to learn any knowledge
About machine learning and AI
No ETL needed
No Knob tuning…
As easy as a numpy knn?
But it never works in production!
But they are actually not
Things you need to consider
To build real world application
• Search quality - Hybrid Search? Filtering?
• Scalability - Handling Billions of Vectors
• Multi tenancy - Isolating Multi-tenant data
• Cost - Memory, disk or S3?
• Security - Data safety and Privacy
• A real world example is way complicated..
Lesson 101: Put VectorDB into production
01 Design your schema
02 Think about how to scale
03 Pick your right index and tune
Design your schema - Dynamic or Fixed?
Flexible, no need to align between each row
Dynamic schema
Save memory with compacted format
Fix schema
Performant
fi
ltering on columnar format
No need to align between entities
Hybrid schema
Have both
fi
x and dynamic schema
fi
eld
Pick primary and partition key
Primary key is unique, like a chunkID or imageID
Primary Key
Partition key is extremely useful on multi tenant case
Partition key
Partition key is used to map data into partition
Primary key can be auto generated or user de
fi
ne
Clustering Key (New)
Cluster data in the back ground, search partial clusters
Can be any scalar
fi
eld for OSS, and vector
fi
eld is
available on Zilliz cloud
Data is mapped to shard by hash(Primary key)
Pick embedding types
Distance : IP
Models: Splade, BGE-M3
Index: Wand, Graph
Distance : IP,L2, Cosine
Models: OpenAI, BGE, Cohere
Index: Faiss, HNSW
Distance : Hamming,
Superstructure, Jaccard, Tanimoto
Models: Cohere, Meta ESM-2
Index: Faiss
Schema Demo for a typical RAG application
Think about how to scale
Collection: Similar to a table in a traditional database. Each collection is
contained within a single database. Database can be isolated by resource group
Shard: Collections are sharded based on a hash of the primary key (PK). The
shard number cannot be dynamically changed for now.
Partition: Refers to a field that you frequently filter on, such as departmentID,
date, or goods type. Milvus currently supports up to 1024 partitions.
Segment: The minimal unit for balancing and building indexes. There are two
types of segments: growing and sealed.
• Growing Segment: A segment that is actively receiving new data inserts.
• Sealed Segment: A read-only segment that has completed indexing and is
ready for query operations.
Replica: Similar to database replication. Creating more replicas can improve
failure recovery speed and read throughput.
So many concepts, any best practice?
Data size
A Milvus collection can host over 10 billion data entries. Each shard hosts 100-500 million data entries,
and in most cases, 1-2 shards are more than enough. If you have an intensive write workload, increasing
the number of shards can also help to improve the write throughput.
Tenant number
Use collections to isolate tenants if your number of tenants is less than 10,000. For many tenants, use
partition keys. Partition keys use a hybrid approach of logical and physical partitioning thus can support
an unlimited number of tenants.
What about QPS?
Milvus is distributed, usually add more query nodes boost performance. For small datasets, increase
memory replicas can help on distributing query loads evenly to more query nodes.
How to pick index?
GPU index: FAISS GPU, Nvidia CAGRA
Memory index: FAISS, HNSW, ZILLIZ Cardinal
Disk index: DiskANN, ZILLIZ Cardinal
Swap Index: ZILLIZ Serverless - Est. April
Tune your index
How to ensure both?
How to evaluate indexes?
1. Pick the right index type
2. Tune the index parameter
3. Benchmarking it with VectorDB bench
4. Tune the the search parameter
https://github.com/zilliztech/VectorDBBench
Index Cheat sheet
index Accuracy Latency Throuput Index Time Cost
Cagra(GPU) High Low Very High Fast Very High
HNSW High Low High Slow High
ScaNN Mid Mid High Mid Mid
IVF_FLAT Mid Mid Low Fast Mid
IVF +
Quantization
Low Mid Mid Mid Low
DiskANN High High Mid Very Slow Low
THANK YOU FOR WATCHING
https://github.com/milvus-io/milvus
https://zilliz.com

More Related Content

Similar to VectorDB Schema Design 101 - Considerations for Building a Scalable and Performant Vector Search

MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018Dave Stokes
 
Breaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIBreaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIDataWorks Summit
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014Dylan Tong
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Michael Hiskey
 
Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Andrew Underwood
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data InsightsDataWorks Summit
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Trivadis
 
Real-time Analytics with Redis
Real-time Analytics with RedisReal-time Analytics with Redis
Real-time Analytics with RedisCihan Biyikoglu
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...Niraj Tolia
 
Unlock the value of big data with the DX2000 from NEC
Unlock the value of big data with the DX2000 from NECUnlock the value of big data with the DX2000 from NEC
Unlock the value of big data with the DX2000 from NECPrincipled Technologies
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy snehal parikh
 
How to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementHow to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementClusterpoint
 
Red Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed_Hat_Storage
 
Processing Drone data @Scale
Processing Drone data @ScaleProcessing Drone data @Scale
Processing Drone data @ScaleDr Hajji Hicham
 
Scalable POSIX File Systems in the Cloud
Scalable POSIX File Systems in the CloudScalable POSIX File Systems in the Cloud
Scalable POSIX File Systems in the CloudRed_Hat_Storage
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningModusOptimum
 
DX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to workDX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to workPrincipled Technologies
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiativeMansi Mehra
 

Similar to VectorDB Schema Design 101 - Considerations for Building a Scalable and Performant Vector Search (20)

MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018
 
Breaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIBreaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AI
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
 
Real-time Analytics with Redis
Real-time Analytics with RedisReal-time Analytics with Redis
Real-time Analytics with Redis
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
 
Unlock the value of big data with the DX2000 from NEC
Unlock the value of big data with the DX2000 from NECUnlock the value of big data with the DX2000 from NEC
Unlock the value of big data with the DX2000 from NEC
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 
How to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementHow to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data Management
 
Red Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use Cases
 
Processing Drone data @Scale
Processing Drone data @ScaleProcessing Drone data @Scale
Processing Drone data @Scale
 
Scalable POSIX File Systems in the Cloud
Scalable POSIX File Systems in the CloudScalable POSIX File Systems in the Cloud
Scalable POSIX File Systems in the Cloud
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 
DX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to workDX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to work
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiative
 

More from Zilliz

A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Zilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in MLZilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in MLZilliz
 
Integrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with FloomIntegrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with FloomZilliz
 
Build streaming LLM with Timeplus and Zilliz
Build streaming LLM with Timeplus and ZillizBuild streaming LLM with Timeplus and Zilliz
Build streaming LLM with Timeplus and ZillizZilliz
 
Beyond Retrieval Augmented Generation (RAG): Vector Databases
Beyond Retrieval Augmented Generation (RAG): Vector DatabasesBeyond Retrieval Augmented Generation (RAG): Vector Databases
Beyond Retrieval Augmented Generation (RAG): Vector DatabasesZilliz
 
Chunking, Embeddings, and Vector Databases
Chunking, Embeddings, and Vector DatabasesChunking, Embeddings, and Vector Databases
Chunking, Embeddings, and Vector DatabasesZilliz
 
Introduction to Large Language Model Customization.pdf
Introduction to Large Language Model Customization.pdfIntroduction to Large Language Model Customization.pdf
Introduction to Large Language Model Customization.pdfZilliz
 
Voyage AI: cutting-edge embeddings and rerankers for search and RAG
Voyage AI: cutting-edge embeddings and rerankers for search and RAGVoyage AI: cutting-edge embeddings and rerankers for search and RAG
Voyage AI: cutting-edge embeddings and rerankers for search and RAGZilliz
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsZilliz
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Voyage AI Embedding Models for Retrieval Augmented Generation
Voyage AI Embedding Models for Retrieval Augmented GenerationVoyage AI Embedding Models for Retrieval Augmented Generation
Voyage AI Embedding Models for Retrieval Augmented GenerationZilliz
 
Chat with your data, privately and locally
Chat with your data, privately and locallyChat with your data, privately and locally
Chat with your data, privately and locallyZilliz
 
Introducing Milvus and new features in 2.4 release
Introducing Milvus and new features in 2.4 releaseIntroducing Milvus and new features in 2.4 release
Introducing Milvus and new features in 2.4 releaseZilliz
 

More from Zilliz (15)

A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Zilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in MLZilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in ML
 
Integrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with FloomIntegrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with Floom
 
Build streaming LLM with Timeplus and Zilliz
Build streaming LLM with Timeplus and ZillizBuild streaming LLM with Timeplus and Zilliz
Build streaming LLM with Timeplus and Zilliz
 
Beyond Retrieval Augmented Generation (RAG): Vector Databases
Beyond Retrieval Augmented Generation (RAG): Vector DatabasesBeyond Retrieval Augmented Generation (RAG): Vector Databases
Beyond Retrieval Augmented Generation (RAG): Vector Databases
 
Chunking, Embeddings, and Vector Databases
Chunking, Embeddings, and Vector DatabasesChunking, Embeddings, and Vector Databases
Chunking, Embeddings, and Vector Databases
 
Introduction to Large Language Model Customization.pdf
Introduction to Large Language Model Customization.pdfIntroduction to Large Language Model Customization.pdf
Introduction to Large Language Model Customization.pdf
 
Voyage AI: cutting-edge embeddings and rerankers for search and RAG
Voyage AI: cutting-edge embeddings and rerankers for search and RAGVoyage AI: cutting-edge embeddings and rerankers for search and RAG
Voyage AI: cutting-edge embeddings and rerankers for search and RAG
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Voyage AI Embedding Models for Retrieval Augmented Generation
Voyage AI Embedding Models for Retrieval Augmented GenerationVoyage AI Embedding Models for Retrieval Augmented Generation
Voyage AI Embedding Models for Retrieval Augmented Generation
 
Chat with your data, privately and locally
Chat with your data, privately and locallyChat with your data, privately and locally
Chat with your data, privately and locally
 
Introducing Milvus and new features in 2.4 release
Introducing Milvus and new features in 2.4 releaseIntroducing Milvus and new features in 2.4 release
Introducing Milvus and new features in 2.4 release
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

VectorDB Schema Design 101 - Considerations for Building a Scalable and Performant Vector Search

  • 1. Performance Optimize 101 James Luan 2024.3.19 Key factors for Scalable and Performant Vector Search
  • 2. Speaker Bio James VP of Engineering @ Zilliz Milvus Maintainer, chief architect for Milvus 2.0. Former employee at Oracle and Alibaba, seasoned open-source developer.
  • 3. People say vector search is easy Finish in 10 lines of code No need to learn any knowledge About machine learning and AI No ETL needed No Knob tuning…
  • 4. As easy as a numpy knn? But it never works in production!
  • 5. But they are actually not Things you need to consider To build real world application • Search quality - Hybrid Search? Filtering? • Scalability - Handling Billions of Vectors • Multi tenancy - Isolating Multi-tenant data • Cost - Memory, disk or S3? • Security - Data safety and Privacy • A real world example is way complicated..
  • 6. Lesson 101: Put VectorDB into production 01 Design your schema 02 Think about how to scale 03 Pick your right index and tune
  • 7. Design your schema - Dynamic or Fixed? Flexible, no need to align between each row Dynamic schema Save memory with compacted format Fix schema Performant fi ltering on columnar format No need to align between entities Hybrid schema Have both fi x and dynamic schema fi eld
  • 8. Pick primary and partition key Primary key is unique, like a chunkID or imageID Primary Key Partition key is extremely useful on multi tenant case Partition key Partition key is used to map data into partition Primary key can be auto generated or user de fi ne Clustering Key (New) Cluster data in the back ground, search partial clusters Can be any scalar fi eld for OSS, and vector fi eld is available on Zilliz cloud Data is mapped to shard by hash(Primary key)
  • 9. Pick embedding types Distance : IP Models: Splade, BGE-M3 Index: Wand, Graph Distance : IP,L2, Cosine Models: OpenAI, BGE, Cohere Index: Faiss, HNSW Distance : Hamming, Superstructure, Jaccard, Tanimoto Models: Cohere, Meta ESM-2 Index: Faiss
  • 10. Schema Demo for a typical RAG application
  • 11. Think about how to scale Collection: Similar to a table in a traditional database. Each collection is contained within a single database. Database can be isolated by resource group Shard: Collections are sharded based on a hash of the primary key (PK). The shard number cannot be dynamically changed for now. Partition: Refers to a field that you frequently filter on, such as departmentID, date, or goods type. Milvus currently supports up to 1024 partitions. Segment: The minimal unit for balancing and building indexes. There are two types of segments: growing and sealed. • Growing Segment: A segment that is actively receiving new data inserts. • Sealed Segment: A read-only segment that has completed indexing and is ready for query operations. Replica: Similar to database replication. Creating more replicas can improve failure recovery speed and read throughput.
  • 12. So many concepts, any best practice? Data size A Milvus collection can host over 10 billion data entries. Each shard hosts 100-500 million data entries, and in most cases, 1-2 shards are more than enough. If you have an intensive write workload, increasing the number of shards can also help to improve the write throughput. Tenant number Use collections to isolate tenants if your number of tenants is less than 10,000. For many tenants, use partition keys. Partition keys use a hybrid approach of logical and physical partitioning thus can support an unlimited number of tenants. What about QPS? Milvus is distributed, usually add more query nodes boost performance. For small datasets, increase memory replicas can help on distributing query loads evenly to more query nodes.
  • 13. How to pick index? GPU index: FAISS GPU, Nvidia CAGRA Memory index: FAISS, HNSW, ZILLIZ Cardinal Disk index: DiskANN, ZILLIZ Cardinal Swap Index: ZILLIZ Serverless - Est. April
  • 14. Tune your index How to ensure both? How to evaluate indexes? 1. Pick the right index type 2. Tune the index parameter 3. Benchmarking it with VectorDB bench 4. Tune the the search parameter https://github.com/zilliztech/VectorDBBench
  • 15. Index Cheat sheet index Accuracy Latency Throuput Index Time Cost Cagra(GPU) High Low Very High Fast Very High HNSW High Low High Slow High ScaNN Mid Mid High Mid Mid IVF_FLAT Mid Mid Low Fast Mid IVF + Quantization Low Mid Mid Mid Low DiskANN High High Mid Very Slow Low
  • 16. THANK YOU FOR WATCHING https://github.com/milvus-io/milvus https://zilliz.com