Building Efficient Multi-Threaded Filters for Faster SQL Queries

Brought to you by
Building efﬁcient,
multi-threaded ﬁlters for
faster SQL queries
Vlad Ilyushchenko
Co-Founder and CTO of QuestDB

Vlad Ilyushchenko
Co-founder, CTO, QuestDB
■ Turned a side hustle into a company
■ I am interested in human psychology and high performance
computing in equal measure
■ Away from work I mostly play hide and seek with my kids

QuestDB is a time series database

What is the problem?
Most queries for time series include ﬁlters to:
■ Find a time series from table of multiple time series
where symbol = ‘value’
■ Find outlier records in a time series
where value > 10.5
■ Find tagged records in a time series
where tag = ‘value’ and rate < 0.4

Fast column scan
Some beneﬁts:
■ Single data structure - multiple queries
■ Reduces disk space usage
■ No impact on ingestion
■ Full scan algorithm scales well with “sparse” indexes

Software components
■ JIT compilation
■ Linear memory access
■ SIMD
■ Eﬃcient multi-core execution
■ Eﬃcient memory management
■ SQL execution order optimization

Filter function
… where lat > 10 and lon < 100:
… where rowid in fn(lat, lon):
uint64_t[] fn(double[] lat, double[] lon, uint64_t size)

Assembling ﬁlter function
Function is assembled using AsmJit.
■ AsmJit vs LLVM
■ IR - the Intermediate Representation
● Expression is parsed to IR by Java
● IR is bytecode, lives in native memory
● IR is passed to JIT
■ AVX2 assembly
● JIT processes the IR and emits AVX2 assembly instructions
● JIT uses AsmJit for cross-platform function abstraction and register allocation
■ Combining predicates
● JIT generated code has to combine results of a > 10 and b < 3

Calling search function
Call search function from Java:
■ Prepare arguments, memory mapped columns of data
■ Call search function via JNI
■ Filter rows in batches beneﬁting from tight loop
■ Expose data via “row” API

Concurrency
Chained concurrency. Two goals:
■ Perform searches on multiple data chunks concurrently
■ Begin oﬄoading search results before the search fully completes
■ Queue based - share nothing

Concurrency
Filter function is stateless*, in that it only uses stack. So we can call it from
multiple threads.
■ Fork - synchronous
● Chunk the data - prepare Data Frames
● Queue the execution tasks
■ Reduce - performed on thread pool
● Call search function concurrently for multiple Data Frames
● Store row ids in reusable “arena”
■ Join - performed by caller
● Translate row ids into data
● Submit for further processing
* non-JIT ﬁlter functions can be stateful

Single Writer Principle
Single writer per table, Multi-Version Concurrency Control.
■ Append-only table versions
■ Transaction commit bumps tx watermark
■ Order by time before commit
■ Late data triggers new version creation

Inter-thread messaging framework
MPSequence pubSeq = new MPSequence();
MCSequence subSeq = new MCSqeuence();
pubSeq.then(subSeq).then(pubSeq);
// publisher thread
pubSeq.next()
// consumer thread
subSeq.next();
Pub Sequence
Sub Sequence

Work stealing
Publisher becomes consumer when queue is
full. Consuming behaviour can be adjusted to
prioritise publishing.
■ Publisher does not waste time
■ Pub/Sub system can work on 1 thread
■ Consumer thread does other things when
queue is empty
long cur = pubSeq.next()
if (cur > -1) {
// publish
} else {
cur = subSeq.next();
if (cur > -1) {
// consume
}
}

Circuit breaker
■ We can interrupt long running execution
● On timeout
● On connection drop
● If we detect we have done enough work (LIMIT N / LIMIT -N)
■ Circuit breaker is an atomically executed code injected into every queue
slot

Other nuances
We employ a number of techniques to reduce wait, contention and memory
consumption:
■ Fixed worker pool
■ Tagged fork-join sequences
■ Sharded bounded queues
■ Reusable row id “arena”

Conclusion
■ Pros:
● Nice performance gains
● Faster than index where index size is larger than column scanned
● Full scan works well with sparse indexes
■ Cons:
● Implementation is quite complex and indirect
● It was hard to ﬁnd all of many race conditions
● Lots of “fuzz” tests

Further Resources
■ Live demo: https://demo.questdb.io/
■ Blog: How we built a SIMD JIT compiler for SQL in QuestDB
■ Blog: 4Bn rows/sec query benchmark: Clickhouse vs QuestDB vs Timescale
■ Blog: How we built inter-thread messaging from scratch
■ Blog: My journey making QuestDB
We 💕 All Contributions
github.com/questdb/questdb

Brought to you by
Vlad Ilyushchenko
vlad@questdb.io
@ilyushvl

Building Efficient Multi-Threaded Filters for Faster SQL Queries

More Related Content

Similar to Building Efficient Multi-Threaded Filters for Faster SQL Queries

More from ScyllaDB

Recently uploaded

Building Efficient Multi-Threaded Filters for Faster SQL Queries