SlideShare a Scribd company logo
How Kafka Powers the World’s Most Popular
Vector Database
Frank Liu
Speaker
Frank Liu
Director of Operations & ML Architect
frank@zilliz.com
https://linkedin.com/in/fzliu
https://twitter.com/frankzliu
01 Unstructured Data and Embeddings
CONTENTS
02 Vector Database Overview
03 Milvus Architecture
04 Kafka as a Messaging Backbone
01
Unstructured Data and Embeddings
Vector Database Overview
What is Unstructured Data?
Any data that does not conform to a pre-de
fi
ned data model.
The Evolution of Data
Using Vectors to Represent Data
Embeddings!
02
Vector Database Overview
Vector Database Overview
A database purpose-built to store, index, and query large quantities of
embeddings.
Vector Databases in Production
Milvus Features
• Supports hardware accelerators
• SIMD support on CPUs
• GPU support for faster querying & indexing
• Supports key database functions
• Data partitioning and data sharing
• Filtered queries and searches
• Multiple options for indices and similarity metrics
• FAISS (HNSW, Flat, PQ), ANNOY, DiskANN, ScANN, etc…
• Euclidean, dot product (cosine), boolean
• A number of SDKs
• Python
• Go
• Node
• Java
Cloud Nativity
• Kubernetes native
• Deployment through Helm
• Native S3 support
• MinIO-based design
• Azure Blob and GCS support
• Easy on-prem to cloud conversion
• Fully distributed
• Highly elastic and horizontally scalable
• Disaggregated storage and compute (shared storage)
• Separate read, write, and background (indexing) services
Vector Database Ecosystem
03
Milvus Architecture
Milvus Architecture
Access Layer
• Data-related languages (SQL context)
• Data de
fi
nition language: modify/de
fi
ne database schema
• Data management language: store, modify, and retrieve data
• Data control language: de
fi
ne user rights and permissions
• Access layer = multiple proxy nodes
• Proxy node functions
• Manage message ingestion and routing
• Points DDL and DCL instructions to coordinators
• Point DML to log for for worker consumption
Coordinator Layer
• Root coordinator node
• Handles DDL and DCL requests
• Data coordinator node
• Triggers background data operations (
fl
ush, compact, etc)
• Manages data node cluster
• Maintains metadata of inserted data
• Query coordinator node
• Manages query node cluster
• Index coordinator node
• Manages index node cluster
• Determines when indexes are built
• Maintains index metadata
Worker Layer
• Worker overview
• All workers are stateless
• All DML requests are handled by workers
• Data node
• Retrieves incremental log data from log
• Packs and stores log data into log snapshots
• Processes mutation requests
• Query node
• Loads indexes and data from object storage
• Runs searches and queries
• Index node
• Builds indexes on inserted data
Storage Layer
• Log broker - Kafka
• Streaming data persistence
• Execution of reliable asynchronous queries
• Event noti
fi
cation
• Metadata storage - etcd
• Service registration and health checks
• Message consumption checkpoints
• Object storage - S3/MinIO
• Stores snapshot
fi
les of logs
• Stores index
fi
les for scalar and vector data
• Stores intermediate query results
Key Takeaways
• Single coordinator instance per service type
• Coordinators manage corresponding worker node cluster
• Data is stored in Collections
• Akin to collections in MongoDB or tables in relational databases
• Disaggregation of query, indexing, and data
• Signi
fi
cant horizontal scalability
• Support for a wide range of application requirements
• Message streams are core to Milvus
• All data passes through message queue
• Kafka innate cloud nativity allows Milvus to easily scale
04
Kafka as a Messasging Backbone
Milvus’ Messaging Backbone
Milvus’ Messaging Backbone
• Log as data
• Operations are centralized around the log broker
• CRUD operations by subscribing to and consuming logs
• Pub/sub scheme allows for stream & batch processing
• Decoupling of read and write components
• Coordinators manage corresponding worker node cluster
• Support for both streaming and batched execution
• Data nodes read from streams and write to binlog
• Streaming uses WAL, batching uses binlog
• All requests that change system state go through WAL
• Create collection, delete collection
• Insert, update, delete vector
Example: Vector Insert
Example: Vector Insert
• Loggers are organized into a hash ring
• Time Stamp Oracle (TSO) ensures logger consistency
• Different channels for different requests
• Prevents request type interference
• Data nodes subscribe to speci
fi
c channels
• Inserts hashed across multiple channels (+ef
fi
ciency)
• Data nodes can be freely expanded to increase throughput
• Convert row-based WAL to column-based binlogs
• Kafka’s cloud nativity powers Milvus’ scalability
Kafka Powers Milvus
THANK YOU FOR LISTENING
https://github.com/milvus-io/milvus
https://zilliz.com

More Related Content

What's hot

Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Paris Data Engineers !
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
Neo4j
 
GPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphGPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge Graph
Neo4j
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
Sergio Zenatti Filho
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
Wasm1953
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
Dmitry Kan
 
Choosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectChoosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your Project
Ontotext
 
DBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptxDBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptx
Hong Ong
 
Growing the Delta Ecosystem to Rust and Python with Delta-RS
Growing the Delta Ecosystem to Rust and Python with Delta-RSGrowing the Delta Ecosystem to Rust and Python with Delta-RS
Growing the Delta Ecosystem to Rust and Python with Delta-RS
Databricks
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
Cloudera, Inc.
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
Neo4j
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
C4Media
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?
DATAVERSITY
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
Moumie Soulemane
 

What's hot (20)

Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
GPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphGPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge Graph
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Choosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectChoosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your Project
 
DBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptxDBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptx
 
Growing the Delta Ecosystem to Rust and Python with Delta-RS
Growing the Delta Ecosystem to Rust and Python with Delta-RSGrowing the Delta Ecosystem to Rust and Python with Delta-RS
Growing the Delta Ecosystem to Rust and Python with Delta-RS
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 

Similar to How Kafka Powers the World's Most Popular Vector Database System with Charles Xie and Frank Liu | Current 2022

Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
Ovais Tariq
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDC
Abhijit Kumar
 
Closer Look at Cloud Centric Architectures
Closer Look at Cloud Centric ArchitecturesCloser Look at Cloud Centric Architectures
Closer Look at Cloud Centric Architectures
Todd Kaplinger
 
The Microsoft Cloud Partner
The Microsoft Cloud PartnerThe Microsoft Cloud Partner
The Microsoft Cloud Partner
Neethu Kuruvilla
 
Azure SQL Database
Azure SQL Database Azure SQL Database
Azure SQL Database
nj-azure
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
Shy Engelberg
 
Nosql query processing system for wireless sensor networks
Nosql query processing system for wireless sensor networksNosql query processing system for wireless sensor networks
Nosql query processing system for wireless sensor networks
Nikhil Bhaware
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
Torsten Steinbach
 
Lighthouse 20100120
Lighthouse 20100120Lighthouse 20100120
Lighthouse 20100120
Infrastructure 2.0
 
Lighthouse20100120
Lighthouse20100120Lighthouse20100120
Lighthouse20100120sureddy
 
Microservices in Azure
Microservices in AzureMicroservices in Azure
Microservices in Azure
Doug Vanderweide
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
datastack
 
Microservices using .Net core
Microservices using .Net coreMicroservices using .Net core
Microservices using .Net core
girish goudar
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
Bigstep
 
MySQL NDB Cluster 101
MySQL NDB Cluster 101MySQL NDB Cluster 101
MySQL NDB Cluster 101
Bernd Ocklin
 
Ankus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration frameworkAnkus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration framework
Ashrith Mekala
 
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
AFAS Software
 
Tokyo Azure Meetup #5 - Microservices and Azure Service Fabric
Tokyo Azure Meetup #5 - Microservices and Azure Service FabricTokyo Azure Meetup #5 - Microservices and Azure Service Fabric
Tokyo Azure Meetup #5 - Microservices and Azure Service Fabric
Tokyo Azure Meetup
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platform
giventocode
 

Similar to How Kafka Powers the World's Most Popular Vector Database System with Charles Xie and Frank Liu | Current 2022 (20)

Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDC
 
Closer Look at Cloud Centric Architectures
Closer Look at Cloud Centric ArchitecturesCloser Look at Cloud Centric Architectures
Closer Look at Cloud Centric Architectures
 
The Microsoft Cloud Partner
The Microsoft Cloud PartnerThe Microsoft Cloud Partner
The Microsoft Cloud Partner
 
Azure SQL Database
Azure SQL Database Azure SQL Database
Azure SQL Database
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
 
Nosql query processing system for wireless sensor networks
Nosql query processing system for wireless sensor networksNosql query processing system for wireless sensor networks
Nosql query processing system for wireless sensor networks
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Lighthouse 20100120
Lighthouse 20100120Lighthouse 20100120
Lighthouse 20100120
 
Lighthouse20100120
Lighthouse20100120Lighthouse20100120
Lighthouse20100120
 
Microservices in Azure
Microservices in AzureMicroservices in Azure
Microservices in Azure
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Microservices using .Net core
Microservices using .Net coreMicroservices using .Net core
Microservices using .Net core
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
MySQL NDB Cluster 101
MySQL NDB Cluster 101MySQL NDB Cluster 101
MySQL NDB Cluster 101
 
Ankus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration frameworkAnkus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration framework
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_databaseOracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_database
 
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
 
Tokyo Azure Meetup #5 - Microservices and Azure Service Fabric
Tokyo Azure Meetup #5 - Microservices and Azure Service FabricTokyo Azure Meetup #5 - Microservices and Azure Service Fabric
Tokyo Azure Meetup #5 - Microservices and Azure Service Fabric
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platform
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 

How Kafka Powers the World's Most Popular Vector Database System with Charles Xie and Frank Liu | Current 2022

  • 1. How Kafka Powers the World’s Most Popular Vector Database Frank Liu
  • 2. Speaker Frank Liu Director of Operations & ML Architect frank@zilliz.com https://linkedin.com/in/fzliu https://twitter.com/frankzliu
  • 3. 01 Unstructured Data and Embeddings CONTENTS 02 Vector Database Overview 03 Milvus Architecture 04 Kafka as a Messaging Backbone
  • 6. What is Unstructured Data? Any data that does not conform to a pre-de fi ned data model.
  • 8. Using Vectors to Represent Data Embeddings!
  • 10. Vector Database Overview A database purpose-built to store, index, and query large quantities of embeddings.
  • 11. Vector Databases in Production
  • 12. Milvus Features • Supports hardware accelerators • SIMD support on CPUs • GPU support for faster querying & indexing • Supports key database functions • Data partitioning and data sharing • Filtered queries and searches • Multiple options for indices and similarity metrics • FAISS (HNSW, Flat, PQ), ANNOY, DiskANN, ScANN, etc… • Euclidean, dot product (cosine), boolean • A number of SDKs • Python • Go • Node • Java
  • 13. Cloud Nativity • Kubernetes native • Deployment through Helm • Native S3 support • MinIO-based design • Azure Blob and GCS support • Easy on-prem to cloud conversion • Fully distributed • Highly elastic and horizontally scalable • Disaggregated storage and compute (shared storage) • Separate read, write, and background (indexing) services
  • 17. Access Layer • Data-related languages (SQL context) • Data de fi nition language: modify/de fi ne database schema • Data management language: store, modify, and retrieve data • Data control language: de fi ne user rights and permissions • Access layer = multiple proxy nodes • Proxy node functions • Manage message ingestion and routing • Points DDL and DCL instructions to coordinators • Point DML to log for for worker consumption
  • 18. Coordinator Layer • Root coordinator node • Handles DDL and DCL requests • Data coordinator node • Triggers background data operations ( fl ush, compact, etc) • Manages data node cluster • Maintains metadata of inserted data • Query coordinator node • Manages query node cluster • Index coordinator node • Manages index node cluster • Determines when indexes are built • Maintains index metadata
  • 19. Worker Layer • Worker overview • All workers are stateless • All DML requests are handled by workers • Data node • Retrieves incremental log data from log • Packs and stores log data into log snapshots • Processes mutation requests • Query node • Loads indexes and data from object storage • Runs searches and queries • Index node • Builds indexes on inserted data
  • 20. Storage Layer • Log broker - Kafka • Streaming data persistence • Execution of reliable asynchronous queries • Event noti fi cation • Metadata storage - etcd • Service registration and health checks • Message consumption checkpoints • Object storage - S3/MinIO • Stores snapshot fi les of logs • Stores index fi les for scalar and vector data • Stores intermediate query results
  • 21. Key Takeaways • Single coordinator instance per service type • Coordinators manage corresponding worker node cluster • Data is stored in Collections • Akin to collections in MongoDB or tables in relational databases • Disaggregation of query, indexing, and data • Signi fi cant horizontal scalability • Support for a wide range of application requirements • Message streams are core to Milvus • All data passes through message queue • Kafka innate cloud nativity allows Milvus to easily scale
  • 22. 04 Kafka as a Messasging Backbone
  • 24. Milvus’ Messaging Backbone • Log as data • Operations are centralized around the log broker • CRUD operations by subscribing to and consuming logs • Pub/sub scheme allows for stream & batch processing • Decoupling of read and write components • Coordinators manage corresponding worker node cluster • Support for both streaming and batched execution • Data nodes read from streams and write to binlog • Streaming uses WAL, batching uses binlog • All requests that change system state go through WAL • Create collection, delete collection • Insert, update, delete vector
  • 26. Example: Vector Insert • Loggers are organized into a hash ring • Time Stamp Oracle (TSO) ensures logger consistency • Different channels for different requests • Prevents request type interference • Data nodes subscribe to speci fi c channels • Inserts hashed across multiple channels (+ef fi ciency) • Data nodes can be freely expanded to increase throughput • Convert row-based WAL to column-based binlogs • Kafka’s cloud nativity powers Milvus’ scalability
  • 28. THANK YOU FOR LISTENING https://github.com/milvus-io/milvus https://zilliz.com