QuestDB: The building blocks of a fast open-source time-series database

December 12-14 2023 December 12-14 2023
The Building Blocks of a
Time-Series database
Javier Ramirez
Database advocate
@supercoco9

https://twitter.com/rshidashrf/status/975566803052134400

https://stackoverflow.com/questions/6841333/why-is-subtracting-these-two-epoch-milli-times-in-year-1927-giving-a-strange-r

1553983200
* This is a date. Which one?

When is this?
1/4/19
* is this April 19th? january 4th? April 1st?

Working with timestamped data in a
database is tricky*
* specially working with analytics of data changing over time or at a high rate

If you can use only one
database for everything, go
with PostgreSQL*
* Or any other major and well supported RDBMS

Some things RDBMS are not designed for
● Writing data faster than it is read (several millions of inserts per day and faster)
● Aggregations scoped to different time units (per year/minute/microsecond)
● Identifying gaps or missing data for a given interval
● Joining tables by approximate timestamp
● Sparse data (tables with hundreds or thousands of columns)
● Aggregates over billions of records

● a factory floor with 500 machines, or
● a fleet with 500 vehicles, or
● 50 trains, with 10 cars each, or
● 500 users with a mobile phone
Sending data every second
How I made my first billion

2,628,288
* Seconds in one month. Well, in the average month of 30.42 days anyway

43,200,000 rows a day…….
302,400,000 rows a week….
1,314,144,000 rows a month
How I made my ﬁrst billion
* See? On streaming data, It is kind of easy to get your ﬁrst billion of data points

Not all data
problems are
the same

Time-series database basics
● Optimised for fast append-only ingestion
● Data lifecycle policies
● Analytics over chunks of time
● Time-based aggregations
● Often power real-time dashboards

QuestDB would like to be known for:
● Performance
○ Also with smaller machines
● Developer Experience
● Proudly Open Source (Apache 2.0)

Fast streaming
ingestion
* You can try ingesting streaming data using https://github.com/javier/questdb-quickstart

QuestDB ingestion and storage layer
● Data always stored by incremental timestamp.
● Data partitioned by time units and stored in columnar format.
● No indexes needed. Data is immediately available after writing.
● Predictable ingestion rate, even under demanding workloads (millions/second).
● Built-in event deduplication.
● Optimized data types (Symbol, geohash, ipv4, uuid).
● Row updates and upserts supported.

Lifecycle policies
ALTER TABLE my_table DROP PARTITION LIST '2021-01-01', '2021-01-02';
--Delete days before 2021-01-03
ALTER TABLE my_table DROP PARTITION WHERE timestamp < to_timestamp('2021-01-03', 'yyyy-MM-dd');
ALTER TABLE x DETACH PARTITION LIST '2019-02-01', '2019-02-02';
-- It is also possible to use WHERE clause to define the partition list
ALTER TABLE sensors DETACH PARTITION WHERE < '2019-02-03T00';
CREATE TABLE my_table (i symbol, ts timestamp) IN VOLUME SECONDARY_VOLUME;

Connectivity, protocols, and interfaces
● REST API and web console: Query execution, CSV imports/exports. Basic charts.
● Pgwire: perfect for querying, DDL, and DML. Ingestion supported, up to moderate
throughput. Compatible with any low-level postgresql client or library.
● Influx Line Protocol(ILP): socket-based, ingestion only, very high throughput. Official clients
available for C/C++, JAVA, Python, Rust, Go, NodeJS, and .Net.
● Health/Metrics: HTTP endpoint with Prometheus format
● Integrations with: Apache Kafka, Apache Flink, Apache Spark, Python Pandas, Grafana,
Superset, Telegraf, Redpanda, qStudio, SQLAlchemy, Cube…

QuestDB Query engine internals
● Our Java codebase has zero dependencies. No garbage collection on
the hot path. As close to the hardware as possible.
● We research the latest trends. Our code takes advantage of the
state-of-the-art in CPU, storage design, and data structures.
● We implement our own Just in Time Compiler to make query execution
as parallel and fast as possible.
● We spend weeks of development to save microseconds or
nanoseconds in many operations.

The query language: SQL with
time-series extensions

LATEST ON … PARTITION BY …
Retrieves the latest entry by timestamp for a given key or combination of keys, for scenarios where multiple
time series are stored in the same table.
SELECT * FROM trades
LATEST ON timestamp PARTITION BY symbol;
Try it live on
https://demo.questdb.io

LATEST ON … PARTITION BY …
Retrieves the latest entry by timestamp for a given key or combination of keys, for scenarios where multiple
time series are stored in the same table.
SELECT * FROM trades
WHERE symbol in ('BTC-USD', 'ETH-USD')
LATEST ON timestamp PARTITION BY symbol, side;
Try it live on

SAMPLE BY
Aggregates data in homogeneous time chunks
SELECT
timestamp,
sum(price * amount) / sum(amount) AS vwap_price,
sum(amount) AS volume
FROM trades
WHERE symbol = 'BTC-USD' AND timestamp > dateadd('d', -1, now())
SAMPLE BY 15m ALIGN TO CALENDAR;
SELECT timestamp, min(tempF),
max(tempF), avg(tempF)
FROM weather SAMPLE BY 1M;
Try it live on

How do you ask your database to
return which data is not stored?

I am sending data every second or
so. Tell me which devices didn’t
send any data with more than 1.5
seconds gap

SAMPLE BY … FILL
Can fill missing time chunks using different strategies (NULL, constant, LINEAR, PREVious value)
SELECT
timestamp,
sum(price * amount) / sum(amount) AS vwap_price,
sum(amount) AS volume
FROM trades
WHERE symbol = 'BTC-USD' AND timestamp > dateadd('d', -1, now())
SAMPLE BY 1s FILL(NULL) ALIGN TO CALENDAR;
Try it live on

WHERE … TIME RANGE
SELECT * from trips WHERE pickup_datetime in '2018';
SELECT * from trips WHERE pickup_datetime in '2018-06';
SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59';
Try it live on

SELECT * from trips WHERE pickup_datetime in '2018;2M' LIMIT -10;
SELECT * from trips WHERE pickup_datetime in '2018;10s' LIMIT -10;
SELECT * from trips WHERE pickup_datetime in '2018;-3d' LIMIT -10;
Try it live on

SELECT * from trips WHERE pickup_datetime in '2018;2M' LIMIT -10;
SELECT * from trips WHERE pickup_datetime in '2018;10s' LIMIT -10;
SELECT * from trips WHERE pickup_datetime in '2018;-3d' LIMIT -10;
SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59:58;4s;1d;7'
SELECT * from trips WHERE pickup_datetime in '2018-06-21T23:59:58;4s;-1d;7'
Try it live on

What if I have two tables, where
data is (obviously) not sent at the
same exact timestamps and I want
to join by closest matching
timestamp?

ASOF JOIN (LT JOIN and SPLICE JOIN variations)
ASOF JOIN joins two different time-series measured. For each row in the first time-series, the ASOF JOIN takes from
the second time-series a timestamp that meets both of the following criteria:
● The timestamp is the closest to the first timestamp.
● The timestamp is strictly prior or equal to the first timestamp.
WITH trips2018 AS (
SELECT * from trips WHERE pickup_datetime in '2016'
)
SELECT pickup_datetime, timestamp, fare_amount, tempF, windDir
FROM trips2018
ASOF JOIN weather;
Try it live on

Some things we are trying out next for performance
● Compression, and exploring data formats like arrow/ parquet
● Own ingestion protocol
● Second level partitioning
● Improved vectorization of some operations (group by multiple columns or by expressions
● Add speciﬁc joins optimizations (index nested loop joins, for example)

QuestDB OSS
Open Source. Self-managed. Suitable for
production workloads.
https://github.com/questdb/questdb
QuestDB Enterprise
Licensed. Self-managed. Enterprise features like
RBAC, compression, replication, TLS on all
protocols, cold storage, K8s operator…
https://questdb.io/enterprise/
QuestDB Cloud
Fully managed, pay per usage environment,
with enterprise-grade features.
https://questdb.io/cloud/

OSA CON | December 12-14 2023
Q&A
● github.com/questdb/questdb
● https://questdb.io
● https://demo.questdb.io
● https://github.com/javier/questdb-quickstart
● https://slack.questdb.io/
40
Javier Ramirez
@supercoco9
We 💕 contributions
and GitHub ⭐ stars

QuestDB: The building blocks of a fast open-source time-series database

More Related Content

What's hot

Similar to QuestDB: The building blocks of a fast open-source time-series database

More from javier ramirez

Recently uploaded

QuestDB: The building blocks of a fast open-source time-series database