SlideShare a Scribd company logo
1 of 43
Download to read offline
© 2022 Altinity, Inc.
Size Matters
Best Practices for Trillion Row
Datasets in ClickHouse
Robert Hodges and Altinity Engineering
10 August 2022
1
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Let’s make some introductions
ClickHouse support and services including Altinity.Cloud
Authors of Altinity Kubernetes Operator for ClickHouse
and other open source projects
Robert Hodges
Database geek with 30+ years
on DBMS systems. Day job:
Altinity CEO
Altinity Engineering
Database geeks with centuries
of experience in DBMS and
applications
2
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Foundations
3
© 2022 Altinity, Inc.
Understands SQL
Runs on bare metal to cloud
Shared nothing architecture
Stores data in columns
Parallel and vectorized execution
Scales to many petabytes
Is Open source (Apache 2.0)
ClickHouse is a SQL Data Warehouse
It’s a popular engine for
real-time analytics
ClickHouse
Event
Streams
ELT
Object
Storage
Interactive
Graphics
Dashboards
APIs
4
© 2022 Altinity, Inc.
Seeing is believing
5
Demo Time!
© 2022 Altinity, Inc.
Some definitions to guide discussion
Enabling Fast, Cost-Efficent End User
Access to Trillion-Row Datasets
6
Consistent, sub-second response
that scales linearly with resources
Deliver query results at costs that
are low and predictable
Market TICK data, DNS queries, weblogs, network flow
logs, service logs, CDN telemetry, real-time ad bids, …
© 2022 Altinity, Inc.
Quick,
Iterative
Exploration
Why do we need fast access to source data?
7
Question Answer
Why do temperature
sensors fail
intermittently?
Slice and dice
queries on detailed
sensor data
Bad firmware
upgrade
© 2022 Altinity, Inc.
Principles for large datasets in ClickHouse
8
Reduce queries to a single scan
Reduce I/O
Parallelize query
Lean on aggregation (instead of joins)
Index information with materialized views
© 2022 Altinity, Inc.
The key: One table* to rule them all
9
Unaggregated
table data
Keep intermediate
results in RAM
Minimize I/O Overhead
Maximize applied CPU
Parallelize everything
Results
Query
And make the scans really fast
* Joins are OK! Just not
big ones.
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Basic design for 1
trillion row tables
10
© 2022 Altinity, Inc.
ClickHouse Server Architecture
11
Query Parser Query Interpreter Query Pipeline
Query
Table Primary
Key Indexes
Scanned
column
blocks from
storage
Joined
data (hash
tables)
Intermediate
Query Results
(hash tables)
Columnar data in block storage
Columnar data in object
storage
OS Page Cache
© 2022 Altinity, Inc.
Round up the usual performance suspects
12
Data
Partitioning
Codecs
Compression Skip
Indexes
Projections
Sharding
Distributed Query
Data
Types
Read
Replicas
Tiered Storage
Primary key index
In-RAM dictionaries
© 2022 Altinity, Inc.
Name: 201905_510_815_3
MergeTree table organization in ClickHouse
13
Name: 201901_208_474_4
Name: 201901_1_207_3
Parts
Sparse
index
Sorted, compressed,
indexed column
Skip indexes
Minmax
Bloom
© 2022 Altinity, Inc.
CREATE TABLE IF NOT EXISTS readings_unopt (
sensor_id Int64,
sensor_type Int32,
location String,
time DateTime,
date Date DEFAULT toDate(time),
reading Float32
) Engine = MergeTree
PARTITION BY tuple()
ORDER BY tuple();
Let’s start by making an experimental table!
14
Sub-optimal
datatypes!
No codecs!
No partitioning
or ordering!
© 2022 Altinity, Inc.
Here is a better experimental table with lower I/O cost
15
CREATE TABLE IF NOT EXISTS readings_zstd (
sensor_id Int32 Codec(DoubleDelta, ZSTD(1)),
sensor_type UInt16 Codec(ZSTD(1)),
location LowCardinality(String) Codec(ZSTD(1)),
time DateTime Codec(DoubleDelta, ZSTD(1)),
date ALIAS toDate(time),
temperature Decimal(5,2) Codec(T64, ZSTD(10))
)
Engine = MergeTree
PARTITION BY toYYYYMM(time)
ORDER BY (location, sensor_id, time);
Optimized data
types
Codecs + ZSTD
compression
ALIAS column
Sorting by key
columns + time
Time-based
partitioning
© 2022 Altinity, Inc.
On-disk table size for different schemas
16
© 2022 Altinity, Inc.
ClickHouse single node query model
17
Query
Result
ClickHouse Server
Parse/Plan
Merge/Sort
Vectorized
Scan
In-RAM
Hash
Tables
Parts in Storage
© 2022 Altinity, Inc.
Exploring linear local CPU scaling
18
–Query over 1.01 Billion rows
set max_threads = 16;
SELECT
toYYYYMM(time) AS month,
countIf(msg_type = 'reading') AS
readings,
countIf(msg_type = 'restart') AS
restarts,
min(temperature) AS min,
round(avg(temperature)) AS avg,
max(temperature) AS max
FROM test.readings_multi
WHERE sensor_id BETWEEN 0 and 10000
GROUP BY month ORDER BY month ASC;
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Ingesting data
into large tables
19
© 2022 Altinity, Inc.
Pattern: multiple entities in a single table
Large table joins are an
anti-pattern in low-latency apps
20
Restart
● msg_type=’restart’
● sensor_id
● time
Reading
● msg_type=’reading’
● sensor_id
● time
● temperature Error
● msg_type=’err’
● sensor_id
● time
● message
© 2022 Altinity, Inc.
Many apps keep entity sources for future flexibility
21
msg_type
temperature
time
date
{
"sensor_id": "0",
"time": "2019-01-01 00:00:00",
"msg_type": "reading",
"temperature": "46.31",
"message": "",
"device_type": "0",
"firmware": "frx23.0.22"
}
message
sensor_type
sensor_id
Materialized
columns
Source
data
1.34 bytes/row 4.14 bytes/row, ~96% compression with ZSTD(1)
© 2022 Altinity, Inc.
Schema for a table based on multi-entity JSON
22
CREATE TABLE IF NOT EXISTS readings_multi_json (
sensor_id Int32 Codec(DoubleDelta, LZ4),
sensor_type UInt8,
time DateTime Codec(DoubleDelta, LZ4),
date ALIAS toDate(time),
msg_type enum('reading'=1, 'restart'=2, 'err'=3),
temperature Decimal(5,2) Codec(T64, LZ4),
message String DEFAULT '',
json String DEFAULT ''
) Engine = MergeTree
PARTITION BY toYYYYMM(time)
ORDER BY (msg_type, sensor_id, time);
Codecs + LZ4
compression
Sort by msg_type,
sensor, time
Discriminator
column
String column
for JSON data
© 2022 Altinity, Inc.
Loading raw data into large systems
Nginx Logs
Sensor data readings_etl
(Null Engine)
readings_multi
(Mergetree
Engine)
MV
INSERT
INTO
Enrich data with
materialized view
© 2022 Altinity, Inc.
ClickHouse makes it easy to materialize columns
24
ALTER TABLE readings_multi_json
ADD COLUMN IF NOT EXISTS firmware String
DEFAULT JSONExtractString(json, 'firmware')
;
ALTER TABLE readings_multi_json
UPDATE firmware = firmware WHERE 1=1
;
© 2022 Altinity, Inc.
…But you can also index and query JSON directly
25
ALTER TABLE readings_multi_json
ADD INDEX jsonbf json TYPE tokenbf_v1(8192, 3, 0)
GRANULARITY 1;
ALTER TABLE readings_multi_json
MATERIALIZE INDEX jsonbf;
-- Count matches on column.
SELECT count()
FROM readings_multi_json
WHERE
firmware = 'frx23ID0000x2532'
-- Count token matches in JSON.
SELECT count()
FROM readings_multi_json
WHERE
hasToken(json,'frx23ID0000x2532')
Bloom filter tuning
is complicated!
© 2022 Altinity, Inc.
Results are good if you have high cardinality values
26
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Unique
ClickHouse tricks
for large datasets
27
© 2022 Altinity, Inc.
How can we make queries fast on large data sets?
28
Create queries that work
in a single scan without
large-table joins
© 2022 Altinity, Inc.
Hint: Aggregation runs in a single pass
= 2
Sum = 6
Count = 3
1 2 3 1 3 5 0 5 0 0
Sum = 9
Count = 3
Sum = 5
Count = 4
6 + 9 + 5
3 + 3 + 4
29
No need to
move data
Parallelizes!
Intermediate
results are
reusable
© 2022 Altinity, Inc.
What about queries over all entities?
30
SELECT toYYYYMM(time) AS month,
countIf(msg_type = 'reading') AS readings,
countIf(msg_type = 'restart') AS restarts,
min(temperature) AS min,
round(avg(temperature)) AS avg, max(temperature) AS max
FROM test.readings_multi WHERE sensor_id = 3
GROUP BY month ORDER BY month ASC
┌──month─┬─readings─┬─restarts─┬───min─┬─avg─┬────max─┐
│ 201901 │ 44640 │ 1 │ 0 │ 75 │ 118.33 │
│ 201902 │ 40320 │ 0 │ 68.09 │ 81 │ 93.98 │
│ 201903 │ 15840 │ 0 │ 73.19 │ 84 │ 95.3 │
└────────┴──────────┴──────────┴───────┴─────┴────────┘
Use conditional
aggregation!
© 2022 Altinity, Inc.
msg_type sensor_id time temperature
sensor_id restart_time
time temperature
sensor_id restart_time
time temperature
What about joins on distributed data?
Use case: join restarts with temperature readings
31
sensor_id uptime
time temperature
Restart times
msg_type
‘restart’
sensor_id time
Temperature readings
Temperatures after restart
msg_type sensor_id time temperature
JOIN key
msg_type
‘reading’
sensor_id time temperature
© 2022 Altinity, Inc.
msg_type sensor_id time temperature
msg_type sensor_id time temperature
Aggregation can implement joins!
32
sensor_id uptime
time temperature
Restart and temperature records
msg_type sensor_id time
msg_type sensor_id time temperature
Temperatures after restart
sensor_id
restart_time: t1
reading_time: [t1, t2, t3, t4, …]
temp: [76.44, 90.39, 82.08, 48.12, ..]
236
236
236
236
…
t1
t2
t3
t4
...
76.44
90.39
82.08
48.12
...
30
90
150
210
...
GROUP BY
key
Grouped array values
ARRAY JOIN to pivot
on arrays
© 2022 Altinity, Inc.
And here’s the code…
33
SELECT sensor_id, reading_time, temp, reading_time,
reading_time - restart_time AS uptime
FROM (
WITH toDateTime('2019-04-17 11:00:00') as start_of_range
SELECT sensor_id, groupArrayIf(time, msg_type = 'reading') AS
reading_time,
groupArrayIf(temperature, msg_type = 'reading') AS temp,
anyIf(time, msg_type = 'restart') AS restart_time
FROM test.readings_multi rm
WHERE (sensor_id = 2555)
AND time BETWEEN start_of_range AND start_of_range + 600
GROUP BY sensor_id)
ARRAY JOIN reading_time, temp Not everyone’s cup of tea,
but it works!!!
© 2022 Altinity, Inc.
How about locating key events in tables?
34
When was the
last restart on
sensor 236?
SELECT message
FROM readings_multi
WHERE (msg_type, sensor_id, time) IN
(SELECT msg_type, sensor_id, max(time)
FROM readings_multi
WHERE msg_type = 'restart'
AND sensor_id = 236
GROUP BY msg_type, sensor_id)
Expensive on large
datasets!
© 2022 Altinity, Inc.
Finding the last restart is an aggregation task!
35
236 2019-01-10 20:00:13 restart
sensor_id time msg_type 236 2019-01-10 21:07:56 restart
sensor_id time msg_type
236 2019-01-10 21:07:56 restart
sensor_id time msg_type
Merge
GROUP BY key
Max value
Matching
row value
© 2022 Altinity, Inc.
Use materialized views to “index” data
Finding the last restart on a sensor
36
Block lands in
source table
Block(s) land
in materialized
view target
table
SELECT
sensor_id,
max(time) AS time
FROM readings_multi
WHERE msg_type = 'restart'
GROUP BY sensor_id
“Last point query”
MergeTree Table
AggregatingMergeTree
Table
© 2022 Altinity, Inc.
And here’s code for the materialized view…
37
CREATE TABLE sensor_last_restart_agg (
sensor_id Int32,
time SimpleAggregateFunction(max, DateTime),
msg_type AggregateFunction(argMax, String, DateTime)
)
ENGINE = AggregatingMergeTree()
PARTITION BY tuple() ORDER BY sensor_id
CREATE MATERIALIZED VIEW sensor_last_restart
TO sensor_last_restart_agg AS SELECT
sensor_id, max(time) AS time,
argMaxState(msg_type, time) AS msg_type
FROM readings_multi
WHERE msg_type = 'restart' GROUP BY sensor_id
SimpleAggregateFunction
simplifies insert and query
tuple() is a dubious choice!
© 2022 Altinity, Inc.
Comparison of source table to typical materialized view
38
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Wrap-up
39
© 2022 Altinity, Inc.
Learnings from large ClickHouse installations
Use a single large table to hold all entities
Make sound implementation choices to get baseline performance
Include source data for future flexibility without moving data
Aggregation is a secret ClickHouse power: use it to scan, join, index data
Your reward: Linear scaling, high cost-efficiency, and happy users
40
© 2022 Altinity, Inc.
Other important techniques for big data
Sharding and replication
Tiered storage
Object storage
Approximate queries using sampling and approximate uniqs (lighter aggregation)
41
© 2022 Altinity, Inc.
Where is the documentation?
ClickHouse official docs – https://clickhouse.com/docs/
Altinity Blog – https://altinity.com/blog/
Altinity Youtube Channel –
https://www.youtube.com/channel/UCE3Y2lDKl_ZfjaCrh62onYA
Altinity Knowledge Base – https://kb.altinity.com/
Meetups, other blogs, and external resources. Use your powers of Search!
42
© 2022 Altinity, Inc.
Thank you!
Questions?
https://altinity.com
43
Altinity.Cloud
Altinity Support
Altinity Stable
Builds
We’re hiring!

More Related Content

What's hot

High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseAltinity Ltd
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...Altinity Ltd
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howAltinity Ltd
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Altinity Ltd
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOAltinity Ltd
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...Altinity Ltd
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
Fun with click house window functions webinar slides 2021-08-19
Fun with click house window functions webinar slides  2021-08-19Fun with click house window functions webinar slides  2021-08-19
Fun with click house window functions webinar slides 2021-08-19Altinity Ltd
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouseAltinity Ltd
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOAltinity Ltd
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Ltd
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesAltinity Ltd
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...Altinity Ltd
 
Altinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdfAltinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdfAltinity Ltd
 
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UIData Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UIAltinity Ltd
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevAltinity Ltd
 
Materialize: a platform for changing data
Materialize: a platform for changing dataMaterialize: a platform for changing data
Materialize: a platform for changing dataAltinity Ltd
 

What's hot (20)

High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and how
 
ClickHouse Keeper
ClickHouse KeeperClickHouse Keeper
ClickHouse Keeper
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Fun with click house window functions webinar slides 2021-08-19
Fun with click house window functions webinar slides  2021-08-19Fun with click house window functions webinar slides  2021-08-19
Fun with click house window functions webinar slides 2021-08-19
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
Altinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdfAltinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdf
 
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UIData Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
Materialize: a platform for changing data
Materialize: a platform for changing dataMaterialize: a platform for changing data
Materialize: a platform for changing data
 

More from Altinity Ltd

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxAltinity Ltd
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Altinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceAltinity Ltd
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfAltinity Ltd
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfAltinity Ltd
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Altinity Ltd
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Altinity Ltd
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfAltinity Ltd
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsAltinity Ltd
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAltinity Ltd
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache PinotAltinity Ltd
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Ltd
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...Altinity Ltd
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfAltinity Ltd
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...Altinity Ltd
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...Altinity Ltd
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...Altinity Ltd
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...Altinity Ltd
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...Altinity Ltd
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfAltinity Ltd
 

More from Altinity Ltd (20)

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
 

Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-10.pdf

  • 1. © 2022 Altinity, Inc. Size Matters Best Practices for Trillion Row Datasets in ClickHouse Robert Hodges and Altinity Engineering 10 August 2022 1 © 2022 Altinity, Inc.
  • 2. © 2022 Altinity, Inc. Let’s make some introductions ClickHouse support and services including Altinity.Cloud Authors of Altinity Kubernetes Operator for ClickHouse and other open source projects Robert Hodges Database geek with 30+ years on DBMS systems. Day job: Altinity CEO Altinity Engineering Database geeks with centuries of experience in DBMS and applications 2
  • 3. © 2022 Altinity, Inc. © 2022 Altinity, Inc. Foundations 3
  • 4. © 2022 Altinity, Inc. Understands SQL Runs on bare metal to cloud Shared nothing architecture Stores data in columns Parallel and vectorized execution Scales to many petabytes Is Open source (Apache 2.0) ClickHouse is a SQL Data Warehouse It’s a popular engine for real-time analytics ClickHouse Event Streams ELT Object Storage Interactive Graphics Dashboards APIs 4
  • 5. © 2022 Altinity, Inc. Seeing is believing 5 Demo Time!
  • 6. © 2022 Altinity, Inc. Some definitions to guide discussion Enabling Fast, Cost-Efficent End User Access to Trillion-Row Datasets 6 Consistent, sub-second response that scales linearly with resources Deliver query results at costs that are low and predictable Market TICK data, DNS queries, weblogs, network flow logs, service logs, CDN telemetry, real-time ad bids, …
  • 7. © 2022 Altinity, Inc. Quick, Iterative Exploration Why do we need fast access to source data? 7 Question Answer Why do temperature sensors fail intermittently? Slice and dice queries on detailed sensor data Bad firmware upgrade
  • 8. © 2022 Altinity, Inc. Principles for large datasets in ClickHouse 8 Reduce queries to a single scan Reduce I/O Parallelize query Lean on aggregation (instead of joins) Index information with materialized views
  • 9. © 2022 Altinity, Inc. The key: One table* to rule them all 9 Unaggregated table data Keep intermediate results in RAM Minimize I/O Overhead Maximize applied CPU Parallelize everything Results Query And make the scans really fast * Joins are OK! Just not big ones.
  • 10. © 2022 Altinity, Inc. © 2022 Altinity, Inc. Basic design for 1 trillion row tables 10
  • 11. © 2022 Altinity, Inc. ClickHouse Server Architecture 11 Query Parser Query Interpreter Query Pipeline Query Table Primary Key Indexes Scanned column blocks from storage Joined data (hash tables) Intermediate Query Results (hash tables) Columnar data in block storage Columnar data in object storage OS Page Cache
  • 12. © 2022 Altinity, Inc. Round up the usual performance suspects 12 Data Partitioning Codecs Compression Skip Indexes Projections Sharding Distributed Query Data Types Read Replicas Tiered Storage Primary key index In-RAM dictionaries
  • 13. © 2022 Altinity, Inc. Name: 201905_510_815_3 MergeTree table organization in ClickHouse 13 Name: 201901_208_474_4 Name: 201901_1_207_3 Parts Sparse index Sorted, compressed, indexed column Skip indexes Minmax Bloom
  • 14. © 2022 Altinity, Inc. CREATE TABLE IF NOT EXISTS readings_unopt ( sensor_id Int64, sensor_type Int32, location String, time DateTime, date Date DEFAULT toDate(time), reading Float32 ) Engine = MergeTree PARTITION BY tuple() ORDER BY tuple(); Let’s start by making an experimental table! 14 Sub-optimal datatypes! No codecs! No partitioning or ordering!
  • 15. © 2022 Altinity, Inc. Here is a better experimental table with lower I/O cost 15 CREATE TABLE IF NOT EXISTS readings_zstd ( sensor_id Int32 Codec(DoubleDelta, ZSTD(1)), sensor_type UInt16 Codec(ZSTD(1)), location LowCardinality(String) Codec(ZSTD(1)), time DateTime Codec(DoubleDelta, ZSTD(1)), date ALIAS toDate(time), temperature Decimal(5,2) Codec(T64, ZSTD(10)) ) Engine = MergeTree PARTITION BY toYYYYMM(time) ORDER BY (location, sensor_id, time); Optimized data types Codecs + ZSTD compression ALIAS column Sorting by key columns + time Time-based partitioning
  • 16. © 2022 Altinity, Inc. On-disk table size for different schemas 16
  • 17. © 2022 Altinity, Inc. ClickHouse single node query model 17 Query Result ClickHouse Server Parse/Plan Merge/Sort Vectorized Scan In-RAM Hash Tables Parts in Storage
  • 18. © 2022 Altinity, Inc. Exploring linear local CPU scaling 18 –Query over 1.01 Billion rows set max_threads = 16; SELECT toYYYYMM(time) AS month, countIf(msg_type = 'reading') AS readings, countIf(msg_type = 'restart') AS restarts, min(temperature) AS min, round(avg(temperature)) AS avg, max(temperature) AS max FROM test.readings_multi WHERE sensor_id BETWEEN 0 and 10000 GROUP BY month ORDER BY month ASC;
  • 19. © 2022 Altinity, Inc. © 2022 Altinity, Inc. Ingesting data into large tables 19
  • 20. © 2022 Altinity, Inc. Pattern: multiple entities in a single table Large table joins are an anti-pattern in low-latency apps 20 Restart ● msg_type=’restart’ ● sensor_id ● time Reading ● msg_type=’reading’ ● sensor_id ● time ● temperature Error ● msg_type=’err’ ● sensor_id ● time ● message
  • 21. © 2022 Altinity, Inc. Many apps keep entity sources for future flexibility 21 msg_type temperature time date { "sensor_id": "0", "time": "2019-01-01 00:00:00", "msg_type": "reading", "temperature": "46.31", "message": "", "device_type": "0", "firmware": "frx23.0.22" } message sensor_type sensor_id Materialized columns Source data 1.34 bytes/row 4.14 bytes/row, ~96% compression with ZSTD(1)
  • 22. © 2022 Altinity, Inc. Schema for a table based on multi-entity JSON 22 CREATE TABLE IF NOT EXISTS readings_multi_json ( sensor_id Int32 Codec(DoubleDelta, LZ4), sensor_type UInt8, time DateTime Codec(DoubleDelta, LZ4), date ALIAS toDate(time), msg_type enum('reading'=1, 'restart'=2, 'err'=3), temperature Decimal(5,2) Codec(T64, LZ4), message String DEFAULT '', json String DEFAULT '' ) Engine = MergeTree PARTITION BY toYYYYMM(time) ORDER BY (msg_type, sensor_id, time); Codecs + LZ4 compression Sort by msg_type, sensor, time Discriminator column String column for JSON data
  • 23. © 2022 Altinity, Inc. Loading raw data into large systems Nginx Logs Sensor data readings_etl (Null Engine) readings_multi (Mergetree Engine) MV INSERT INTO Enrich data with materialized view
  • 24. © 2022 Altinity, Inc. ClickHouse makes it easy to materialize columns 24 ALTER TABLE readings_multi_json ADD COLUMN IF NOT EXISTS firmware String DEFAULT JSONExtractString(json, 'firmware') ; ALTER TABLE readings_multi_json UPDATE firmware = firmware WHERE 1=1 ;
  • 25. © 2022 Altinity, Inc. …But you can also index and query JSON directly 25 ALTER TABLE readings_multi_json ADD INDEX jsonbf json TYPE tokenbf_v1(8192, 3, 0) GRANULARITY 1; ALTER TABLE readings_multi_json MATERIALIZE INDEX jsonbf; -- Count matches on column. SELECT count() FROM readings_multi_json WHERE firmware = 'frx23ID0000x2532' -- Count token matches in JSON. SELECT count() FROM readings_multi_json WHERE hasToken(json,'frx23ID0000x2532') Bloom filter tuning is complicated!
  • 26. © 2022 Altinity, Inc. Results are good if you have high cardinality values 26
  • 27. © 2022 Altinity, Inc. © 2022 Altinity, Inc. Unique ClickHouse tricks for large datasets 27
  • 28. © 2022 Altinity, Inc. How can we make queries fast on large data sets? 28 Create queries that work in a single scan without large-table joins
  • 29. © 2022 Altinity, Inc. Hint: Aggregation runs in a single pass = 2 Sum = 6 Count = 3 1 2 3 1 3 5 0 5 0 0 Sum = 9 Count = 3 Sum = 5 Count = 4 6 + 9 + 5 3 + 3 + 4 29 No need to move data Parallelizes! Intermediate results are reusable
  • 30. © 2022 Altinity, Inc. What about queries over all entities? 30 SELECT toYYYYMM(time) AS month, countIf(msg_type = 'reading') AS readings, countIf(msg_type = 'restart') AS restarts, min(temperature) AS min, round(avg(temperature)) AS avg, max(temperature) AS max FROM test.readings_multi WHERE sensor_id = 3 GROUP BY month ORDER BY month ASC ┌──month─┬─readings─┬─restarts─┬───min─┬─avg─┬────max─┐ │ 201901 │ 44640 │ 1 │ 0 │ 75 │ 118.33 │ │ 201902 │ 40320 │ 0 │ 68.09 │ 81 │ 93.98 │ │ 201903 │ 15840 │ 0 │ 73.19 │ 84 │ 95.3 │ └────────┴──────────┴──────────┴───────┴─────┴────────┘ Use conditional aggregation!
  • 31. © 2022 Altinity, Inc. msg_type sensor_id time temperature sensor_id restart_time time temperature sensor_id restart_time time temperature What about joins on distributed data? Use case: join restarts with temperature readings 31 sensor_id uptime time temperature Restart times msg_type ‘restart’ sensor_id time Temperature readings Temperatures after restart msg_type sensor_id time temperature JOIN key msg_type ‘reading’ sensor_id time temperature
  • 32. © 2022 Altinity, Inc. msg_type sensor_id time temperature msg_type sensor_id time temperature Aggregation can implement joins! 32 sensor_id uptime time temperature Restart and temperature records msg_type sensor_id time msg_type sensor_id time temperature Temperatures after restart sensor_id restart_time: t1 reading_time: [t1, t2, t3, t4, …] temp: [76.44, 90.39, 82.08, 48.12, ..] 236 236 236 236 … t1 t2 t3 t4 ... 76.44 90.39 82.08 48.12 ... 30 90 150 210 ... GROUP BY key Grouped array values ARRAY JOIN to pivot on arrays
  • 33. © 2022 Altinity, Inc. And here’s the code… 33 SELECT sensor_id, reading_time, temp, reading_time, reading_time - restart_time AS uptime FROM ( WITH toDateTime('2019-04-17 11:00:00') as start_of_range SELECT sensor_id, groupArrayIf(time, msg_type = 'reading') AS reading_time, groupArrayIf(temperature, msg_type = 'reading') AS temp, anyIf(time, msg_type = 'restart') AS restart_time FROM test.readings_multi rm WHERE (sensor_id = 2555) AND time BETWEEN start_of_range AND start_of_range + 600 GROUP BY sensor_id) ARRAY JOIN reading_time, temp Not everyone’s cup of tea, but it works!!!
  • 34. © 2022 Altinity, Inc. How about locating key events in tables? 34 When was the last restart on sensor 236? SELECT message FROM readings_multi WHERE (msg_type, sensor_id, time) IN (SELECT msg_type, sensor_id, max(time) FROM readings_multi WHERE msg_type = 'restart' AND sensor_id = 236 GROUP BY msg_type, sensor_id) Expensive on large datasets!
  • 35. © 2022 Altinity, Inc. Finding the last restart is an aggregation task! 35 236 2019-01-10 20:00:13 restart sensor_id time msg_type 236 2019-01-10 21:07:56 restart sensor_id time msg_type 236 2019-01-10 21:07:56 restart sensor_id time msg_type Merge GROUP BY key Max value Matching row value
  • 36. © 2022 Altinity, Inc. Use materialized views to “index” data Finding the last restart on a sensor 36 Block lands in source table Block(s) land in materialized view target table SELECT sensor_id, max(time) AS time FROM readings_multi WHERE msg_type = 'restart' GROUP BY sensor_id “Last point query” MergeTree Table AggregatingMergeTree Table
  • 37. © 2022 Altinity, Inc. And here’s code for the materialized view… 37 CREATE TABLE sensor_last_restart_agg ( sensor_id Int32, time SimpleAggregateFunction(max, DateTime), msg_type AggregateFunction(argMax, String, DateTime) ) ENGINE = AggregatingMergeTree() PARTITION BY tuple() ORDER BY sensor_id CREATE MATERIALIZED VIEW sensor_last_restart TO sensor_last_restart_agg AS SELECT sensor_id, max(time) AS time, argMaxState(msg_type, time) AS msg_type FROM readings_multi WHERE msg_type = 'restart' GROUP BY sensor_id SimpleAggregateFunction simplifies insert and query tuple() is a dubious choice!
  • 38. © 2022 Altinity, Inc. Comparison of source table to typical materialized view 38
  • 39. © 2022 Altinity, Inc. © 2022 Altinity, Inc. Wrap-up 39
  • 40. © 2022 Altinity, Inc. Learnings from large ClickHouse installations Use a single large table to hold all entities Make sound implementation choices to get baseline performance Include source data for future flexibility without moving data Aggregation is a secret ClickHouse power: use it to scan, join, index data Your reward: Linear scaling, high cost-efficiency, and happy users 40
  • 41. © 2022 Altinity, Inc. Other important techniques for big data Sharding and replication Tiered storage Object storage Approximate queries using sampling and approximate uniqs (lighter aggregation) 41
  • 42. © 2022 Altinity, Inc. Where is the documentation? ClickHouse official docs – https://clickhouse.com/docs/ Altinity Blog – https://altinity.com/blog/ Altinity Youtube Channel – https://www.youtube.com/channel/UCE3Y2lDKl_ZfjaCrh62onYA Altinity Knowledge Base – https://kb.altinity.com/ Meetups, other blogs, and external resources. Use your powers of Search! 42
  • 43. © 2022 Altinity, Inc. Thank you! Questions? https://altinity.com 43 Altinity.Cloud Altinity Support Altinity Stable Builds We’re hiring!