SlideShare a Scribd company logo
1 of 16
Download to read offline
Performance Optimize 101
James Luan
2024.3.19
Key factors for Scalable and Performant Vector Search
Speaker Bio
James
VP of Engineering @ Zilliz
Milvus Maintainer, chief architect for Milvus 2.0.
Former employee at Oracle and Alibaba, seasoned
open-source developer.
People say vector search is easy
Finish in 10 lines of code
No need to learn any knowledge
About machine learning and AI
No ETL needed
No Knob tuning…
As easy as a numpy knn?
But it never works in production!
But they are actually not
Things you need to consider
To build real world application
• Search quality - Hybrid Search? Filtering?
• Scalability - Handling Billions of Vectors
• Multi tenancy - Isolating Multi-tenant data
• Cost - Memory, disk or S3?
• Security - Data safety and Privacy
• A real world example is way complicated..
Lesson 101: Put VectorDB into production
01 Design your schema
02 Think about how to scale
03 Pick your right index and tune
Design your schema - Dynamic or Fixed?
Flexible, no need to align between each row
Dynamic schema
Save memory with compacted format
Fix schema
Performant
fi
ltering on columnar format
No need to align between entities
Hybrid schema
Have both
fi
x and dynamic schema
fi
eld
Pick primary and partition key
Primary key is unique, like a chunkID or imageID
Primary Key
Partition key is extremely useful on multi tenant case
Partition key
Partition key is used to map data into partition
Primary key can be auto generated or user de
fi
ne
Clustering Key (New)
Cluster data in the back ground, search partial clusters
Can be any scalar
fi
eld for OSS, and vector
fi
eld is
available on Zilliz cloud
Data is mapped to shard by hash(Primary key)
Pick embedding types
Distance : IP
Models: Splade, BGE-M3
Index: Wand, Graph
Distance : IP,L2, Cosine
Models: OpenAI, BGE, Cohere
Index: Faiss, HNSW
Distance : Hamming,
Superstructure, Jaccard, Tanimoto
Models: Cohere, Meta ESM-2
Index: Faiss
Schema Demo for a typical RAG application
Think about how to scale
Collection: Similar to a table in a traditional database. Each collection is
contained within a single database. Database can be isolated by resource group
Shard: Collections are sharded based on a hash of the primary key (PK). The
shard number cannot be dynamically changed for now.
Partition: Refers to a field that you frequently filter on, such as departmentID,
date, or goods type. Milvus currently supports up to 1024 partitions.
Segment: The minimal unit for balancing and building indexes. There are two
types of segments: growing and sealed.
• Growing Segment: A segment that is actively receiving new data inserts.
• Sealed Segment: A read-only segment that has completed indexing and is
ready for query operations.
Replica: Similar to database replication. Creating more replicas can improve
failure recovery speed and read throughput.
So many concepts, any best practice?
Data size
A Milvus collection can host over 10 billion data entries. Each shard hosts 100-500 million data entries,
and in most cases, 1-2 shards are more than enough. If you have an intensive write workload, increasing
the number of shards can also help to improve the write throughput.
Tenant number
Use collections to isolate tenants if your number of tenants is less than 10,000. For many tenants, use
partition keys. Partition keys use a hybrid approach of logical and physical partitioning thus can support
an unlimited number of tenants.
What about QPS?
Milvus is distributed, usually add more query nodes boost performance. For small datasets, increase
memory replicas can help on distributing query loads evenly to more query nodes.
How to pick index?
GPU index: FAISS GPU, Nvidia CAGRA
Memory index: FAISS, HNSW, ZILLIZ Cardinal
Disk index: DiskANN, ZILLIZ Cardinal
Swap Index: ZILLIZ Serverless - Est. April
Tune your index
How to ensure both?
How to evaluate indexes?
1. Pick the right index type
2. Tune the index parameter
3. Benchmarking it with VectorDB bench
4. Tune the the search parameter
https://github.com/zilliztech/VectorDBBench
Index Cheat sheet
index Accuracy Latency Throuput Index Time Cost
Cagra(GPU) High Low Very High Fast Very High
HNSW High Low High Slow High
ScaNN Mid Mid High Mid Mid
IVF_FLAT Mid Mid Low Fast Mid
IVF +
Quantization
Low Mid Mid Mid Low
DiskANN High High Mid Very Slow Low
THANK YOU FOR WATCHING
https://github.com/milvus-io/milvus
https://zilliz.com

More Related Content

Similar to VectorDB Schema Design 101 - Considerations for Building a Scalable and Performant Vector Search

MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018Dave Stokes
 
Breaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIBreaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIDataWorks Summit
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014Dylan Tong
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Michael Hiskey
 
Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Andrew Underwood
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data InsightsDataWorks Summit
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Trivadis
 
Real-time Analytics with Redis
Real-time Analytics with RedisReal-time Analytics with Redis
Real-time Analytics with RedisCihan Biyikoglu
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...Niraj Tolia
 
Unlock the value of big data with the DX2000 from NEC
Unlock the value of big data with the DX2000 from NECUnlock the value of big data with the DX2000 from NEC
Unlock the value of big data with the DX2000 from NECPrincipled Technologies
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy snehal parikh
 
How to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementHow to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementClusterpoint
 
Red Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed_Hat_Storage
 
Processing Drone data @Scale
Processing Drone data @ScaleProcessing Drone data @Scale
Processing Drone data @ScaleDr Hajji Hicham
 
Scalable POSIX File Systems in the Cloud
Scalable POSIX File Systems in the CloudScalable POSIX File Systems in the Cloud
Scalable POSIX File Systems in the CloudRed_Hat_Storage
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningModusOptimum
 
DX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to workDX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to workPrincipled Technologies
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiativeMansi Mehra
 

Similar to VectorDB Schema Design 101 - Considerations for Building a Scalable and Performant Vector Search (20)

MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018
 
Breaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIBreaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AI
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
 
Real-time Analytics with Redis
Real-time Analytics with RedisReal-time Analytics with Redis
Real-time Analytics with Redis
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
 
Unlock the value of big data with the DX2000 from NEC
Unlock the value of big data with the DX2000 from NECUnlock the value of big data with the DX2000 from NEC
Unlock the value of big data with the DX2000 from NEC
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 
How to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementHow to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data Management
 
Red Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use Cases
 
Processing Drone data @Scale
Processing Drone data @ScaleProcessing Drone data @Scale
Processing Drone data @Scale
 
Scalable POSIX File Systems in the Cloud
Scalable POSIX File Systems in the CloudScalable POSIX File Systems in the Cloud
Scalable POSIX File Systems in the Cloud
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 
DX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to workDX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to work
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiative
 

More from Zilliz

Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...Zilliz
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Zilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in MLZilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in MLZilliz
 
Integrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with FloomIntegrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with FloomZilliz
 
Build streaming LLM with Timeplus and Zilliz
Build streaming LLM with Timeplus and ZillizBuild streaming LLM with Timeplus and Zilliz
Build streaming LLM with Timeplus and ZillizZilliz
 
Beyond Retrieval Augmented Generation (RAG): Vector Databases
Beyond Retrieval Augmented Generation (RAG): Vector DatabasesBeyond Retrieval Augmented Generation (RAG): Vector Databases
Beyond Retrieval Augmented Generation (RAG): Vector DatabasesZilliz
 
Chunking, Embeddings, and Vector Databases
Chunking, Embeddings, and Vector DatabasesChunking, Embeddings, and Vector Databases
Chunking, Embeddings, and Vector DatabasesZilliz
 
Introduction to Large Language Model Customization.pdf
Introduction to Large Language Model Customization.pdfIntroduction to Large Language Model Customization.pdf
Introduction to Large Language Model Customization.pdfZilliz
 
Voyage AI: cutting-edge embeddings and rerankers for search and RAG
Voyage AI: cutting-edge embeddings and rerankers for search and RAGVoyage AI: cutting-edge embeddings and rerankers for search and RAG
Voyage AI: cutting-edge embeddings and rerankers for search and RAGZilliz
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsZilliz
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Voyage AI Embedding Models for Retrieval Augmented Generation
Voyage AI Embedding Models for Retrieval Augmented GenerationVoyage AI Embedding Models for Retrieval Augmented Generation
Voyage AI Embedding Models for Retrieval Augmented GenerationZilliz
 
Chat with your data, privately and locally
Chat with your data, privately and locallyChat with your data, privately and locally
Chat with your data, privately and locallyZilliz
 
Introducing Milvus and new features in 2.4 release
Introducing Milvus and new features in 2.4 releaseIntroducing Milvus and new features in 2.4 release
Introducing Milvus and new features in 2.4 releaseZilliz
 

More from Zilliz (19)

Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Zilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in MLZilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in ML
 
Integrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with FloomIntegrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with Floom
 
Build streaming LLM with Timeplus and Zilliz
Build streaming LLM with Timeplus and ZillizBuild streaming LLM with Timeplus and Zilliz
Build streaming LLM with Timeplus and Zilliz
 
Beyond Retrieval Augmented Generation (RAG): Vector Databases
Beyond Retrieval Augmented Generation (RAG): Vector DatabasesBeyond Retrieval Augmented Generation (RAG): Vector Databases
Beyond Retrieval Augmented Generation (RAG): Vector Databases
 
Chunking, Embeddings, and Vector Databases
Chunking, Embeddings, and Vector DatabasesChunking, Embeddings, and Vector Databases
Chunking, Embeddings, and Vector Databases
 
Introduction to Large Language Model Customization.pdf
Introduction to Large Language Model Customization.pdfIntroduction to Large Language Model Customization.pdf
Introduction to Large Language Model Customization.pdf
 
Voyage AI: cutting-edge embeddings and rerankers for search and RAG
Voyage AI: cutting-edge embeddings and rerankers for search and RAGVoyage AI: cutting-edge embeddings and rerankers for search and RAG
Voyage AI: cutting-edge embeddings and rerankers for search and RAG
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Voyage AI Embedding Models for Retrieval Augmented Generation
Voyage AI Embedding Models for Retrieval Augmented GenerationVoyage AI Embedding Models for Retrieval Augmented Generation
Voyage AI Embedding Models for Retrieval Augmented Generation
 
Chat with your data, privately and locally
Chat with your data, privately and locallyChat with your data, privately and locally
Chat with your data, privately and locally
 
Introducing Milvus and new features in 2.4 release
Introducing Milvus and new features in 2.4 releaseIntroducing Milvus and new features in 2.4 release
Introducing Milvus and new features in 2.4 release
 

Recently uploaded

JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfAnubhavMangla3
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxjbellis
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...SOFTTECHHUB
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfalexjohnson7307
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهMohamed Sweelam
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 

Recently uploaded (20)

JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 

VectorDB Schema Design 101 - Considerations for Building a Scalable and Performant Vector Search

  • 1. Performance Optimize 101 James Luan 2024.3.19 Key factors for Scalable and Performant Vector Search
  • 2. Speaker Bio James VP of Engineering @ Zilliz Milvus Maintainer, chief architect for Milvus 2.0. Former employee at Oracle and Alibaba, seasoned open-source developer.
  • 3. People say vector search is easy Finish in 10 lines of code No need to learn any knowledge About machine learning and AI No ETL needed No Knob tuning…
  • 4. As easy as a numpy knn? But it never works in production!
  • 5. But they are actually not Things you need to consider To build real world application • Search quality - Hybrid Search? Filtering? • Scalability - Handling Billions of Vectors • Multi tenancy - Isolating Multi-tenant data • Cost - Memory, disk or S3? • Security - Data safety and Privacy • A real world example is way complicated..
  • 6. Lesson 101: Put VectorDB into production 01 Design your schema 02 Think about how to scale 03 Pick your right index and tune
  • 7. Design your schema - Dynamic or Fixed? Flexible, no need to align between each row Dynamic schema Save memory with compacted format Fix schema Performant fi ltering on columnar format No need to align between entities Hybrid schema Have both fi x and dynamic schema fi eld
  • 8. Pick primary and partition key Primary key is unique, like a chunkID or imageID Primary Key Partition key is extremely useful on multi tenant case Partition key Partition key is used to map data into partition Primary key can be auto generated or user de fi ne Clustering Key (New) Cluster data in the back ground, search partial clusters Can be any scalar fi eld for OSS, and vector fi eld is available on Zilliz cloud Data is mapped to shard by hash(Primary key)
  • 9. Pick embedding types Distance : IP Models: Splade, BGE-M3 Index: Wand, Graph Distance : IP,L2, Cosine Models: OpenAI, BGE, Cohere Index: Faiss, HNSW Distance : Hamming, Superstructure, Jaccard, Tanimoto Models: Cohere, Meta ESM-2 Index: Faiss
  • 10. Schema Demo for a typical RAG application
  • 11. Think about how to scale Collection: Similar to a table in a traditional database. Each collection is contained within a single database. Database can be isolated by resource group Shard: Collections are sharded based on a hash of the primary key (PK). The shard number cannot be dynamically changed for now. Partition: Refers to a field that you frequently filter on, such as departmentID, date, or goods type. Milvus currently supports up to 1024 partitions. Segment: The minimal unit for balancing and building indexes. There are two types of segments: growing and sealed. • Growing Segment: A segment that is actively receiving new data inserts. • Sealed Segment: A read-only segment that has completed indexing and is ready for query operations. Replica: Similar to database replication. Creating more replicas can improve failure recovery speed and read throughput.
  • 12. So many concepts, any best practice? Data size A Milvus collection can host over 10 billion data entries. Each shard hosts 100-500 million data entries, and in most cases, 1-2 shards are more than enough. If you have an intensive write workload, increasing the number of shards can also help to improve the write throughput. Tenant number Use collections to isolate tenants if your number of tenants is less than 10,000. For many tenants, use partition keys. Partition keys use a hybrid approach of logical and physical partitioning thus can support an unlimited number of tenants. What about QPS? Milvus is distributed, usually add more query nodes boost performance. For small datasets, increase memory replicas can help on distributing query loads evenly to more query nodes.
  • 13. How to pick index? GPU index: FAISS GPU, Nvidia CAGRA Memory index: FAISS, HNSW, ZILLIZ Cardinal Disk index: DiskANN, ZILLIZ Cardinal Swap Index: ZILLIZ Serverless - Est. April
  • 14. Tune your index How to ensure both? How to evaluate indexes? 1. Pick the right index type 2. Tune the index parameter 3. Benchmarking it with VectorDB bench 4. Tune the the search parameter https://github.com/zilliztech/VectorDBBench
  • 15. Index Cheat sheet index Accuracy Latency Throuput Index Time Cost Cagra(GPU) High Low Very High Fast Very High HNSW High Low High Slow High ScaNN Mid Mid High Mid Mid IVF_FLAT Mid Mid Low Fast Mid IVF + Quantization Low Mid Mid Mid Low DiskANN High High Mid Very Slow Low
  • 16. THANK YOU FOR WATCHING https://github.com/milvus-io/milvus https://zilliz.com