A ScyllaDB Community
Vector Search with ScyllaDB
Prof. Szymon Wąsik
Director of Engineering
Szymon Wąsik
■ 2007-2018: Research work on discrete optimization
and modeling in bioinformatics
■ 2018-2024: Software Engineering at Google, working
on auto-scaling and analytical infrastructure
■ Currently:
■ Engineering Director at ScyllaDB
■ Professor at Merito University Poznań, Poland
■ Vector search usage scenarios
■ Internal ScyllaDB architecture
■ Preliminary benchmark results
■ Roadmap
Presentation Agenda
Vector search usage
scenarios
Vector search applications
■ Searching objects that can be represented as a vector:
■ Images search and recognition
■ Music and video search
■ Text and document search, including semantic analysis
■ Genetic sequences
■ Analyzing data, including:
■ Facial recognition
■ Medical imaging
■ Sentiment analysis
■ Code similarity detection
Example workflow: RAG
■ Retrieval-augmented generation
■ Method for providing new knowledge for the model
■ Quick to integrate, cheap and small
■ Explainable and always up to date information
RAG: High level workflow
LLM
Knowledge
Augmenting
Prompt Answering Answer
RAG: Augmenting prompt
System Message:
You are a helpful AI assistant. Read
documents, summarize, and answer user
message.
User Question:
[Prompt]
Context:
1) Document Title: [Title 1]
Excerpt: [Text 1]
…
Instructions to the Assistant:
1. Use only the Context above.
…
Now, please provide the best possible
answer to the user’s question.
Knowledge
Augmenting
Prompt …
RAG: Encoding the knowledge
Documents
Tokenize to
chunks
LLM
Encoder
Encode
Embeddings Scylla
Vector DB
Tokenize
LLM
Encoder
Encode
Embedding
Search top
K
Prompt
Knowledge
Storing
Knowledge
Retrieving
Knowledge
Internal ScyllaDB architecture
Requirements
■ Compatibility with Cassandra CQL syntax
■ Vector type:
■ Vector index:
■ Vector queries:
ALTER TABLE cycling.comments_vs ADD comment_vector VECTOR <FLOAT, 5>
CREATE INDEX IF NOT EXISTS ann_index ON vsearch.com(item_vector)
USING 'usearch'
WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' };
SELECT * FROM cycling.comments_vs
ORDER BY comment_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55] LIMIT 3;
Utilizing USearch library
■ Open source library for vector similarity search
■ Embeddings are stored in an HNSW index
■ Written in c++ for speed and safety
■ Leverages SIMD to speed up distance computations
■ 10x faster than FAISS
Architecture
ScyllaDB
Usearch
@ Rust
HNSW
index
Vectors
Table with
objects,
features and
embeddings
Vector
index
metadata
RPC
Pros
■ Synergy of USearch speed and
Scylla’s powers:
■ Replication
■ Cloud deployment
■ Backups
■ Makes easy to replace the indexing
technology
■ Allows adding hardware
acceleration
Cons
■ Makes deployment more difficult
■ Use Scylla Cloud!
■ Creates reliability challenges
■ Use Scylla Cloud!
■ Duplicates data
■ But increases performance
■ Adds networking overhead
■ But we still win on latency
Preliminary benchmark results
Test environment
■ Framework: qdrant vector search benchmark
■ Single test case - glove-100-angular:
■ 1.2M vectors
■ 100 dimensions
■ Single precision baseline: 78% (Cassandra’s out of the box)
■ Azure:
■ D2s v3 VM for client
■ D8s v3 VM for Scylla + usearch
■ Single node deployment
■ Splitting VCPUs between Scylla and usearch
Preliminary Results - Latency [ms]
Preliminary Results - RPS
Preliminary Results - Index Construction [min]
Roadmap
Roadmap
master
(now)
Vector type support
Storing and getting vector type
data is already merged, to be
included in 2025.2
Drivers-side support
Extensive benchmarks
Support in most popular drivers.
Performance benchmarks and
fixes for different levels of
expected precision and cluster
deployments
Q2
Vector search with USearch
Searching top K most similar
vectors with USearch fully
integrated with ScyllaDB
2025.3
(Q3)
Cloud integration
Possibility to create the vector
search infrastructure managed
automatically by Scylla Cloud
Q3/Q4
Stay in Touch
Szymon Wąsik
szymon.wasik@scylladb.com
github.com/swasik
www.linkedin.com/in/szymon-wasik/

Vector Search with ScyllaDB by Szymon Wasik

  • 1.
    A ScyllaDB Community VectorSearch with ScyllaDB Prof. Szymon Wąsik Director of Engineering
  • 2.
    Szymon Wąsik ■ 2007-2018:Research work on discrete optimization and modeling in bioinformatics ■ 2018-2024: Software Engineering at Google, working on auto-scaling and analytical infrastructure ■ Currently: ■ Engineering Director at ScyllaDB ■ Professor at Merito University Poznań, Poland
  • 3.
    ■ Vector searchusage scenarios ■ Internal ScyllaDB architecture ■ Preliminary benchmark results ■ Roadmap Presentation Agenda
  • 4.
  • 5.
    Vector search applications ■Searching objects that can be represented as a vector: ■ Images search and recognition ■ Music and video search ■ Text and document search, including semantic analysis ■ Genetic sequences ■ Analyzing data, including: ■ Facial recognition ■ Medical imaging ■ Sentiment analysis ■ Code similarity detection
  • 6.
    Example workflow: RAG ■Retrieval-augmented generation ■ Method for providing new knowledge for the model ■ Quick to integrate, cheap and small ■ Explainable and always up to date information
  • 7.
    RAG: High levelworkflow LLM Knowledge Augmenting Prompt Answering Answer
  • 8.
    RAG: Augmenting prompt SystemMessage: You are a helpful AI assistant. Read documents, summarize, and answer user message. User Question: [Prompt] Context: 1) Document Title: [Title 1] Excerpt: [Text 1] … Instructions to the Assistant: 1. Use only the Context above. … Now, please provide the best possible answer to the user’s question. Knowledge Augmenting Prompt …
  • 9.
    RAG: Encoding theknowledge Documents Tokenize to chunks LLM Encoder Encode Embeddings Scylla Vector DB Tokenize LLM Encoder Encode Embedding Search top K Prompt Knowledge Storing Knowledge Retrieving Knowledge
  • 10.
  • 11.
    Requirements ■ Compatibility withCassandra CQL syntax ■ Vector type: ■ Vector index: ■ Vector queries: ALTER TABLE cycling.comments_vs ADD comment_vector VECTOR <FLOAT, 5> CREATE INDEX IF NOT EXISTS ann_index ON vsearch.com(item_vector) USING 'usearch' WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT' }; SELECT * FROM cycling.comments_vs ORDER BY comment_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55] LIMIT 3;
  • 12.
    Utilizing USearch library ■Open source library for vector similarity search ■ Embeddings are stored in an HNSW index ■ Written in c++ for speed and safety ■ Leverages SIMD to speed up distance computations ■ 10x faster than FAISS
  • 13.
  • 14.
    Pros ■ Synergy ofUSearch speed and Scylla’s powers: ■ Replication ■ Cloud deployment ■ Backups ■ Makes easy to replace the indexing technology ■ Allows adding hardware acceleration Cons ■ Makes deployment more difficult ■ Use Scylla Cloud! ■ Creates reliability challenges ■ Use Scylla Cloud! ■ Duplicates data ■ But increases performance ■ Adds networking overhead ■ But we still win on latency
  • 15.
  • 16.
    Test environment ■ Framework:qdrant vector search benchmark ■ Single test case - glove-100-angular: ■ 1.2M vectors ■ 100 dimensions ■ Single precision baseline: 78% (Cassandra’s out of the box) ■ Azure: ■ D2s v3 VM for client ■ D8s v3 VM for Scylla + usearch ■ Single node deployment ■ Splitting VCPUs between Scylla and usearch
  • 17.
  • 18.
  • 19.
    Preliminary Results -Index Construction [min]
  • 20.
  • 21.
    Roadmap master (now) Vector type support Storingand getting vector type data is already merged, to be included in 2025.2 Drivers-side support Extensive benchmarks Support in most popular drivers. Performance benchmarks and fixes for different levels of expected precision and cluster deployments Q2 Vector search with USearch Searching top K most similar vectors with USearch fully integrated with ScyllaDB 2025.3 (Q3) Cloud integration Possibility to create the vector search infrastructure managed automatically by Scylla Cloud Q3/Q4
  • 22.
    Stay in Touch SzymonWąsik szymon.wasik@scylladb.com github.com/swasik www.linkedin.com/in/szymon-wasik/