SlideShare a Scribd company logo
1 of 87
Dive into ClickHouse
Storage System
Alexander Sapin, Software Engineer
ClickHouse Storages
External Table Engines:
› File on FS
› S3
› HDFS
› MySQL
› ...
Internal Table Engines:
› Memory
› Log
› StripeLog
› MergeTree family
0 / 84
ClickHouse Storages
External Table Engines:
› File on FS
› S3
› HDFS
› MySQL
› ...
Internal Table Engines:
› Memory
› Log
› StripeLog
› MergeTree family
1 / 84
MergeTree Engines Family
Advantages:
› Inserts are atomic
› Selects are blazing fast
› Primary and secondary indexes
› Data modification!
› Inserts and selects don’t block each other
2 / 84
MergeTree Engines Family
Advantages:
› Inserts are atomic
› Selects are blazing fast
› Primary and secondary indexes
› Data modification!
› Inserts and selects don’t block each other
Features:
› Infrequent INSERTs required
3 / 84
MergeTree Engines Family
Advantages:
› Inserts are atomic
› Selects are blazing fast
› Primary and secondary indexes
› Data modification!
› Inserts and selects don’t block each other
Features:
› Infrequent INSERTs required (work in progress)
4 / 84
MergeTree Engines Family
Advantages:
› Inserts are atomic
› Selects are blazing fast
› Primary and secondary indexes
› Data modification!
› Inserts and selects don’t block each other
Features:
› Infrequent INSERTs required (work in progress)
› Background operations on records with the same keys
› Primary key is NOT unique
5 / 84
Write
Create Table and Fill Some Data
Create Table:
CREATE TABLE mt (
EventDate Date,
OrderID Int32,
BannerID UInt64,
GoalNum Int8
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(EventDate) ORDER BY (OrderID, BannerID)
7 / 84
Create Table and Fill Some Data
Create Table:
CREATE TABLE mt (
EventDate Date,
OrderID Int32,
BannerID UInt64,
GoalNum Int8
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(EventDate) ORDER BY (OrderID, BannerID)
Fill Data (twice):
INSERT INTO mt SELECT toDate('2018-09-26'),
number, number + 10000, number % 128 from numbers(1000000);
INSERT INTO mt SELECT toDate('2018-10-15'),
number, number + 10000, number % 128 from numbers(1000000, 1000000);
8 / 84
Table on Disk
metadata:
$ ls /var/lib/clickhouse/metadata/default/
mt.sql
9 / 84
Table on Disk
metadata:
$ ls /var/lib/clickhouse/metadata/default/
mt.sql
data:
$ ls /var/lib/clickhouse/data/default/mt
201809_2_2_0 201809_3_3_0 201810_1_1_0 201810_4_4_0 201810_1_4_1
detached format_version.txt
10 / 84
Table on Disk
metadata:
$ ls /var/lib/clickhouse/metadata/default/
mt.sql
data:
$ ls /var/lib/clickhouse/data/default/mt
201809_2_2_0 201809_3_3_0 201810_1_1_0 201810_4_4_0 201810_1_4_1
detached format_version.txt
Contents:
› Format file format_version.txt
› Directories with parts
› Directory for detached parts
11 / 84
Parts: Details
› Part of the data in PK order
› Contain interval of insert numbers
› Created for each INSERT
› Cannot be changed (immutable)
12 / 84
Parts: Why?
PK ordering is needed for efficient OLAP queries
› In our case (OrderID, BannerID)
13 / 84
Parts: Why?
PK ordering is needed for efficient OLAP queries
› In our case (OrderID, BannerID)
But the data comes in order of time
› By EventDate
14 / 84
Parts: Why?
PK ordering is needed for efficient OLAP queries
› In our case (OrderID, BannerID)
But the data comes in order of time
› By EventDate
Re-sorting all the data is expensive
› ClickHouse handle hundreds of terabytes
15 / 84
Parts: Why?
PK ordering is needed for efficient OLAP queries
› In our case (OrderID, BannerID)
But the data comes in order of time
› By EventDate
Re-sorting all the data is expensive
› ClickHouse handle hundreds of terabytes
Solution: Store data in a set of ordered parts!
16 / 84
Parts: Main Idea
M N
Part on
disk
Primary
key
Insert number
17 / 84
Parts: Main Idea
Part on
disk
M N N+1
New
batch
Primary
key
Insert number
18 / 84
Parts: Main Idea
M N N+1
Part on
disk
Primary
key
Part on
disk
Insert number
19 / 84
Parts: Atomic insert
M N N+1
Part on
disk
Primary
key
Part on
disk
Insert number
INSERT
20 / 84
Parts: Atomic insert
M N N+1
Part on
disk
Primary
key
Part on
disk
Insert number
INSERT
21 / 84
Parts: Atomic insert
M N N+1
Part on
disk
Primary
key
Part on
disk
Insert number
22 / 84
Read
Parts: Data
$ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1
GoalNum.bin GoalNum.mrk BannerID.bin ...
primary.idx checksums.txt count.txt columns.txt
partition.dat minmax_EventDate.idx
24 / 84
Parts: Data
$ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1
GoalNum.bin GoalNum.mrk BannerID.bin ...
primary.idx checksums.txt count.txt columns.txt
partition.dat minmax_EventDate.idx
Contents:
› primary.idx – primary key on disk
25 / 84
Parts: Data
$ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1
GoalNum.bin GoalNum.mrk BannerID.bin ...
primary.idx checksums.txt count.txt columns.txt
partition.dat minmax_EventDate.idx
Contents:
› primary.idx – primary key on disk
› GoalNum.bin – compressed column
› GoalNum.mrk – marks for column
26 / 84
Parts: Data
$ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1
GoalNum.bin GoalNum.mrk BannerID.bin ...
primary.idx checksums.txt count.txt columns.txt
partition.dat minmax_EventDate.idx
Contents:
› primary.idx – primary key on disk
› GoalNum.bin – compressed column
› GoalNum.mrk – marks for column
› partition.dat – partition id
27 / 84
Parts: Data
$ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1
GoalNum.bin GoalNum.mrk BannerID.bin ...
primary.idx checksums.txt count.txt columns.txt
partition.dat minmax_EventDate.idx
Contents:
› primary.idx – primary key on disk
› GoalNum.bin – compressed column
› GoalNum.mrk – marks for column
› partition.dat – partition id
› ... – a lot of other useful files
28 / 84
Index
› Row-oriented
› Sparse (each 8192 row)
› Stored in memory
› Uncompressed
0 10000
8192
16384
18192
1998848 2008848
primary.idx
OrderID BannerID
26384
0.
1.
2.
N.
29 / 84
Columns
› Each column in separate file
› Compressed by blocks
› Checksums for each block
OrderID.bin
0
65795
131614
197433
Uncompressed
block
14324
17214
17215
17216
17217
Checksums
30 / 84
How to Use Index?
Problem:
› Index is sparse and contain rows
› Columns contain compressed blocks
How to match index with columns?
31 / 84
Solution: Marks
› Mark – offset in compressed file and uncompressed block
› Stored in column_name.mrk files
› One for each index row
32 / 84
Solution: Marks
› Mark – offset in compressed file and uncompressed block
› Stored in column_name.mrk files
› One for each index row
65795 0
OrderID.mrk
OrderID.bin
0
65795
131614
197433
0
32768
Uncompressed
block
8192
65795 32768
33 / 84
Put it all together
Algorithm:
› Determine required index rows
› Found corresponding marks
› Distribute granules (stripe of marks) among threads
› Read required granules
Properties:
› Read granules concurrently
› Threads can steal tasks
34 / 84
Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread	1 thread	2
OrderID BannerID
SELECT	any(EventDate),
max(GoalNum)	FROM	mt
WHERE	OrderID	BETWEEN
6123	AND	17345
35 / 84
Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread	1 thread	2
OrderID BannerID
SELECT	any(EventDate),
max(GoalNum)	FROM	mt
WHERE	OrderID	BETWEEN
6123	AND	17345
36 / 84
Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread	1 thread	2
OrderID BannerID
SELECT	any(EventDate),
max(GoalNum)	FROM	mt
WHERE	OrderID	BETWEEN
6123	AND	17345
37 / 84
Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread	1 thread	2
OrderID BannerID
SELECT	any(EventDate),
max(GoalNum)	FROM	mt
WHERE	OrderID	BETWEEN
6123	AND	17345
38 / 84
Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread	1 thread	2
OrderID BannerID
SELECT	any(EventDate),
max(GoalNum)	FROM	mt
WHERE	OrderID	BETWEEN
6123	AND	17345
39 / 84
Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread	1 thread	2
OrderID BannerID
SELECT	any(EventDate),
max(GoalNum)	FROM	mt
WHERE	OrderID	BETWEEN
6123	AND	17345
40 / 84
Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread	1 thread	2
OrderID BannerID
SELECT	any(EventDate),
max(GoalNum)	FROM	mt
WHERE	OrderID	BETWEEN
6123	AND	17345
41 / 84
Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread	1 thread	2
OrderID BannerID
SELECT	any(EventDate),
max(GoalNum)	FROM	mt
WHERE	OrderID	BETWEEN
6123	AND	17345
42 / 84
Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread	1 thread	2
OrderID BannerID
SELECT	any(EventDate),
max(GoalNum)	FROM	mt
WHERE	OrderID	BETWEEN
6123	AND	17345
43 / 84
Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread	1 thread	2
OrderID BannerID
SELECT	any(EventDate),
max(GoalNum)	FROM	mt
WHERE	OrderID	BETWEEN
6123	AND	17345
44 / 84
Read Data from Disk
0 10000
8192
16384
18192
26384
1998848 2008848
EventDate OrderID GoalNum
2b 4b 1b
primary.idx
KeyCondition
.mrk .bin .mrk .bin .mrk .bin
thread	1 thread	2
OrderID BannerID
SELECT	any(EventDate),
max(GoalNum)	FROM	mt
WHERE	OrderID	BETWEEN
6123	AND	17345
45 / 84
Compaction
Problem: Amount Files in Parts
Test example: Almost OK
$ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1 | wc -l
14
47 / 84
Problem: Amount Files in Parts
Test example: Almost OK
$ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1 | wc -l
14
Production: Too much
$ ssh -A root@mtgiga075-2.metrika.yandex.net
# ls /var/lib/clickhouse/data/merge/visits_v2/202002_4462_4462_0 | wc -l
1556
48 / 84
Solution: Merges
M N N+1
Primary
key
Insert number
Part
[M,N]
Part
[N+1]
49 / 84
Solution: Merges
M N N+1
Background merge
[M,N] [N+1]
Primary
key
Part Part
Insert number
50 / 84
Solution: Merges
M N N+1
[M,N+1]
Primary
key
Insert number
Part
51 / 84
Properties of Merge
› Each part participate in a single successful merge
› Source parts became inactive
› Addional logic during merge
52 / 84
Things to do while merging
Replace/update records
› ReplacingMergeTree – replace
› SummingMergeTree – sum
› CollapsingMergeTree – fold
› VersionedCollapsingMergeTree – fold rows + versioning
Pre-aggregate data
› AggregatingMergeTree – merge aggregate function states
Metrics rollup
› GraphiteMergeTree – rollup in graphite fashion
53 / 84
Modify
Partitioning
ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY ...
55 / 84
Partitioning
ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY ...
› Logical entities (doesn’t stored on disk)
› Table can be partitioned by any expression (default: by month)
› Parts from different partitions never merged
› MinMax index by partition columns
› Easy manipulation of partitions:
56 / 84
Partitioning
ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY ...
› Logical entities (doesn’t stored on disk)
› Table can be partitioned by any expression (default: by month)
› Parts from different partitions never merged
› MinMax index by partition columns
› Easy manipulation of partitions:
ALTER TABLE mt DROP PARTITION 201810
ALTER TABLE mt DETACH/ATTACH PARTITION 201809
57 / 84
Mutations
ALTER TABLE mt DELETE WHERE OrderID < 1205
ALTER TABLE mt UPDATE GoalNum = 3 WHERE BannerID = 235433;
Features
› NOT designed for regular usage
› Overwrite all touched parts on disk
› Work in background
› Original parts became inactive
58 / 84
Mutations
M N N+1
Primary
key
Insert number
59 / 84
Mutations
M N N+1
Primary
key
Insert number
Mutation
60 / 84
Mutations
M N N+1
Primary
key
Insert number
Mutation
61 / 84
Mutations
M N N+1
Primary
key
Insert number
Mutation
62 / 84
All together
Parts Lifetime
Insert number
SELECT
64 / 84
Parts Lifetime
Insert number
SELECT
65 / 84
Parts Lifetime
Insert number
SELECT
INSERT
INSERT
66 / 84
Parts Lifetime
Insert number
SELECT
67 / 84
Parts Lifetime
Insert number
SELECT
68 / 84
Parts Lifetime
Insert number
SELECT
Merge
69 / 84
Parts Lifetime
Insert number
SELECT
70 / 84
Parts Lifetime
Insert number
SELECT
71 / 84
Parts Lifetime
Insert number
SELECT
72 / 84
Parts Lifetime
Insert number
SELECT
73 / 84
Parts Lifetime
Insert number
SELECT
74 / 84
Summarize: MergeTree Consists of
Block
Column
Part
Partition
Table
75 / 84
Summarize: MergeTree Consists of
Block
Column
Part
Partition
Table
76 / 84
Summarize: MergeTree Consists of
Block
Column
Part
Partition
Table
77 / 84
Summarize: MergeTree Consists of
Block
Column
Part
Partition
Table
78 / 84
Summarize: MergeTree Consists of
Block
Column
Part
Partition
Table
79 / 84
Things to remember
Control total number of parts
› Rate of INSERTs
80 / 84
Things to remember
Control total number of parts
› Rate of INSERTs
Merging runs in the background
› Even when there are no queries!
› With additional fold logic
81 / 84
Things to remember
Control total number of parts
› Rate of INSERTs
Merging runs in the background
› Even when there are no queries!
› With additional fold logic
Index is sparse
› Must fit into memory
› Determines order of data on disk
› Using the index is always beneficial
82 / 84
Things to remember
Control total number of parts
› Rate of INSERTs
Merging runs in the background
› Even when there are no queries!
› With additional fold logic
Index is sparse
› Must fit into memory
› Determines order of data on disk
› Using the index is always beneficial
Partitions is logical entity
› Easy manipulation with portions of data
› Cannot improve SELECTs performance
83 / 84
Thank you
QA
84 / 84

More Related Content

What's hot

InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...InfluxData
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenPostgresOpen
 
Tiered storage intro. By Robert Hodges, Altinity CEO
Tiered storage intro. By Robert Hodges, Altinity CEOTiered storage intro. By Robert Hodges, Altinity CEO
Tiered storage intro. By Robert Hodges, Altinity CEOAltinity Ltd
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...Altinity Ltd
 
ClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesAltinity Ltd
 
Creating Beautiful Dashboards with Grafana and ClickHouse
Creating Beautiful Dashboards with Grafana and ClickHouseCreating Beautiful Dashboards with Grafana and ClickHouse
Creating Beautiful Dashboards with Grafana and ClickHouseAltinity Ltd
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyNETWAYS
 
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)Altinity Ltd
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovAltinity Ltd
 
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL-Consulting
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouseAltinity Ltd
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXzznate
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOAltinity Ltd
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin airKonstantine Krutiy
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestDuyhai Doan
 
Unified Data Platform, by Pauline Yeung of Cisco Systems
Unified Data Platform, by Pauline Yeung of Cisco SystemsUnified Data Platform, by Pauline Yeung of Cisco Systems
Unified Data Platform, by Pauline Yeung of Cisco SystemsAltinity Ltd
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationAlexey Lesovsky
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Altinity Ltd
 
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade DowntimeSCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade DowntimeJeff Frost
 

What's hot (20)

InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 
Tiered storage intro. By Robert Hodges, Altinity CEO
Tiered storage intro. By Robert Hodges, Altinity CEOTiered storage intro. By Robert Hodges, Altinity CEO
Tiered storage intro. By Robert Hodges, Altinity CEO
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...
 
ClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic Continues
 
Creating Beautiful Dashboards with Grafana and ClickHouse
Creating Beautiful Dashboards with Grafana and ClickHouseCreating Beautiful Dashboards with Grafana and ClickHouse
Creating Beautiful Dashboards with Grafana and ClickHouse
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross Lawley
 
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQ
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin air
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
Unified Data Platform, by Pauline Yeung of Cisco Systems
Unified Data Platform, by Pauline Yeung of Cisco SystemsUnified Data Platform, by Pauline Yeung of Cisco Systems
Unified Data Platform, by Pauline Yeung of Cisco Systems
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
 
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade DowntimeSCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
 

Similar to 21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system

Drupal MySQL Cluster
Drupal MySQL ClusterDrupal MySQL Cluster
Drupal MySQL ClusterKris Buytaert
 
SequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational DatabaseSequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational Databasewangzhonnew
 
Проведение криминалистической экспертизы и анализа руткит-программ на примере...
Проведение криминалистической экспертизы и анализа руткит-программ на примере...Проведение криминалистической экспертизы и анализа руткит-программ на примере...
Проведение криминалистической экспертизы и анализа руткит-программ на примере...Alex Matrosov
 
Positive Hack Days. Матросов. Мастер-класс: Проведение криминалистической экс...
Positive Hack Days. Матросов. Мастер-класс: Проведение криминалистической экс...Positive Hack Days. Матросов. Мастер-класс: Проведение криминалистической экс...
Positive Hack Days. Матросов. Мастер-класс: Проведение криминалистической экс...Positive Hack Days
 
Building OpenDNS Stats
Building OpenDNS StatsBuilding OpenDNS Stats
Building OpenDNS StatsGeorge Ang
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersJonathan Levin
 
PE102 - a Windows executable format overview (booklet V1)
PE102 - a Windows executable format overview (booklet V1)PE102 - a Windows executable format overview (booklet V1)
PE102 - a Windows executable format overview (booklet V1)Ange Albertini
 
Transparent sharding with Spider: what's new and getting started
Transparent sharding with Spider: what's new and getting startedTransparent sharding with Spider: what's new and getting started
Transparent sharding with Spider: what's new and getting startedMariaDB plc
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in ActionSveta Smirnova
 
Optimizing Parallel Reduction in CUDA : NOTES
Optimizing Parallel Reduction in CUDA : NOTESOptimizing Parallel Reduction in CUDA : NOTES
Optimizing Parallel Reduction in CUDA : NOTESSubhajit Sahu
 
All You Need to Know About HCL Notes 64-Bit Clients
All You Need to Know About HCL Notes 64-Bit ClientsAll You Need to Know About HCL Notes 64-Bit Clients
All You Need to Know About HCL Notes 64-Bit Clientspanagenda
 
Data recovery from storage device
Data recovery from storage deviceData recovery from storage device
Data recovery from storage deviceMohit Shah
 
DITA Open Toolkit Deployment with XMetaL Author Enterprise 6
DITA Open Toolkit Deployment with XMetaL Author Enterprise 6DITA Open Toolkit Deployment with XMetaL Author Enterprise 6
DITA Open Toolkit Deployment with XMetaL Author Enterprise 6XMetaL
 
SQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance TuningSQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance TuningSeeQuality.net
 
x86_system_architecture.pptx
x86_system_architecture.pptxx86_system_architecture.pptx
x86_system_architecture.pptxAtul Vaish
 
扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区yiditushe
 

Similar to 21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system (20)

Drupal MySQL Cluster
Drupal MySQL ClusterDrupal MySQL Cluster
Drupal MySQL Cluster
 
SequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational DatabaseSequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational Database
 
Проведение криминалистической экспертизы и анализа руткит-программ на примере...
Проведение криминалистической экспертизы и анализа руткит-программ на примере...Проведение криминалистической экспертизы и анализа руткит-программ на примере...
Проведение криминалистической экспертизы и анализа руткит-программ на примере...
 
Positive Hack Days. Матросов. Мастер-класс: Проведение криминалистической экс...
Positive Hack Days. Матросов. Мастер-класс: Проведение криминалистической экс...Positive Hack Days. Матросов. Мастер-класс: Проведение криминалистической экс...
Positive Hack Days. Матросов. Мастер-класс: Проведение криминалистической экс...
 
Building OpenDNS Stats
Building OpenDNS StatsBuilding OpenDNS Stats
Building OpenDNS Stats
 
Gg steps
Gg stepsGg steps
Gg steps
 
Hotsos Advanced Linux Tools
Hotsos Advanced Linux ToolsHotsos Advanced Linux Tools
Hotsos Advanced Linux Tools
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for Developers
 
PE102 - a Windows executable format overview (booklet V1)
PE102 - a Windows executable format overview (booklet V1)PE102 - a Windows executable format overview (booklet V1)
PE102 - a Windows executable format overview (booklet V1)
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
Basic Linux kernel
Basic Linux kernelBasic Linux kernel
Basic Linux kernel
 
Transparent sharding with Spider: what's new and getting started
Transparent sharding with Spider: what's new and getting startedTransparent sharding with Spider: what's new and getting started
Transparent sharding with Spider: what's new and getting started
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in Action
 
Optimizing Parallel Reduction in CUDA : NOTES
Optimizing Parallel Reduction in CUDA : NOTESOptimizing Parallel Reduction in CUDA : NOTES
Optimizing Parallel Reduction in CUDA : NOTES
 
All You Need to Know About HCL Notes 64-Bit Clients
All You Need to Know About HCL Notes 64-Bit ClientsAll You Need to Know About HCL Notes 64-Bit Clients
All You Need to Know About HCL Notes 64-Bit Clients
 
Data recovery from storage device
Data recovery from storage deviceData recovery from storage device
Data recovery from storage device
 
DITA Open Toolkit Deployment with XMetaL Author Enterprise 6
DITA Open Toolkit Deployment with XMetaL Author Enterprise 6DITA Open Toolkit Deployment with XMetaL Author Enterprise 6
DITA Open Toolkit Deployment with XMetaL Author Enterprise 6
 
SQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance TuningSQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
 
x86_system_architecture.pptx
x86_system_architecture.pptxx86_system_architecture.pptx
x86_system_architecture.pptx
 
扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区
 

More from Athens Big Data

22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...Athens Big Data
 
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...Athens Big Data
 
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...Athens Big Data
 
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the coversAthens Big Data
 
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: VeltiAthens Big Data
 
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...Athens Big Data
 
19th Athens Big Data Meetup - 1st Talk - NLP understanding
19th Athens Big Data Meetup - 1st Talk - NLP understanding19th Athens Big Data Meetup - 1st Talk - NLP understanding
19th Athens Big Data Meetup - 1st Talk - NLP understandingAthens Big Data
 
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on KubernetesAthens Big Data
 
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a ServiceAthens Big Data
 
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...Athens Big Data
 
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...Athens Big Data
 
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...Athens Big Data
 
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...Athens Big Data
 
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
15th Athens Big Data Meetup - 1st Talk - Running Spark On MesosAthens Big Data
 
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...Athens Big Data
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data
 
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...Athens Big Data
 
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...Athens Big Data
 
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And GradingAthens Big Data
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY StyleAthens Big Data
 

More from Athens Big Data (20)

22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
 
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
 
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
 
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
 
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
 
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
 
19th Athens Big Data Meetup - 1st Talk - NLP understanding
19th Athens Big Data Meetup - 1st Talk - NLP understanding19th Athens Big Data Meetup - 1st Talk - NLP understanding
19th Athens Big Data Meetup - 1st Talk - NLP understanding
 
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
 
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
 
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
 
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
 
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
 
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
 
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
 
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
 
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
 
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfChristopherTHyatt
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfEasyPrinterHelp
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 

Recently uploaded (20)

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 

21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system

  • 1.
  • 2. Dive into ClickHouse Storage System Alexander Sapin, Software Engineer
  • 3. ClickHouse Storages External Table Engines: › File on FS › S3 › HDFS › MySQL › ... Internal Table Engines: › Memory › Log › StripeLog › MergeTree family 0 / 84
  • 4. ClickHouse Storages External Table Engines: › File on FS › S3 › HDFS › MySQL › ... Internal Table Engines: › Memory › Log › StripeLog › MergeTree family 1 / 84
  • 5. MergeTree Engines Family Advantages: › Inserts are atomic › Selects are blazing fast › Primary and secondary indexes › Data modification! › Inserts and selects don’t block each other 2 / 84
  • 6. MergeTree Engines Family Advantages: › Inserts are atomic › Selects are blazing fast › Primary and secondary indexes › Data modification! › Inserts and selects don’t block each other Features: › Infrequent INSERTs required 3 / 84
  • 7. MergeTree Engines Family Advantages: › Inserts are atomic › Selects are blazing fast › Primary and secondary indexes › Data modification! › Inserts and selects don’t block each other Features: › Infrequent INSERTs required (work in progress) 4 / 84
  • 8. MergeTree Engines Family Advantages: › Inserts are atomic › Selects are blazing fast › Primary and secondary indexes › Data modification! › Inserts and selects don’t block each other Features: › Infrequent INSERTs required (work in progress) › Background operations on records with the same keys › Primary key is NOT unique 5 / 84
  • 10. Create Table and Fill Some Data Create Table: CREATE TABLE mt ( EventDate Date, OrderID Int32, BannerID UInt64, GoalNum Int8 ) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (OrderID, BannerID) 7 / 84
  • 11. Create Table and Fill Some Data Create Table: CREATE TABLE mt ( EventDate Date, OrderID Int32, BannerID UInt64, GoalNum Int8 ) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (OrderID, BannerID) Fill Data (twice): INSERT INTO mt SELECT toDate('2018-09-26'), number, number + 10000, number % 128 from numbers(1000000); INSERT INTO mt SELECT toDate('2018-10-15'), number, number + 10000, number % 128 from numbers(1000000, 1000000); 8 / 84
  • 12. Table on Disk metadata: $ ls /var/lib/clickhouse/metadata/default/ mt.sql 9 / 84
  • 13. Table on Disk metadata: $ ls /var/lib/clickhouse/metadata/default/ mt.sql data: $ ls /var/lib/clickhouse/data/default/mt 201809_2_2_0 201809_3_3_0 201810_1_1_0 201810_4_4_0 201810_1_4_1 detached format_version.txt 10 / 84
  • 14. Table on Disk metadata: $ ls /var/lib/clickhouse/metadata/default/ mt.sql data: $ ls /var/lib/clickhouse/data/default/mt 201809_2_2_0 201809_3_3_0 201810_1_1_0 201810_4_4_0 201810_1_4_1 detached format_version.txt Contents: › Format file format_version.txt › Directories with parts › Directory for detached parts 11 / 84
  • 15. Parts: Details › Part of the data in PK order › Contain interval of insert numbers › Created for each INSERT › Cannot be changed (immutable) 12 / 84
  • 16. Parts: Why? PK ordering is needed for efficient OLAP queries › In our case (OrderID, BannerID) 13 / 84
  • 17. Parts: Why? PK ordering is needed for efficient OLAP queries › In our case (OrderID, BannerID) But the data comes in order of time › By EventDate 14 / 84
  • 18. Parts: Why? PK ordering is needed for efficient OLAP queries › In our case (OrderID, BannerID) But the data comes in order of time › By EventDate Re-sorting all the data is expensive › ClickHouse handle hundreds of terabytes 15 / 84
  • 19. Parts: Why? PK ordering is needed for efficient OLAP queries › In our case (OrderID, BannerID) But the data comes in order of time › By EventDate Re-sorting all the data is expensive › ClickHouse handle hundreds of terabytes Solution: Store data in a set of ordered parts! 16 / 84
  • 20. Parts: Main Idea M N Part on disk Primary key Insert number 17 / 84
  • 21. Parts: Main Idea Part on disk M N N+1 New batch Primary key Insert number 18 / 84
  • 22. Parts: Main Idea M N N+1 Part on disk Primary key Part on disk Insert number 19 / 84
  • 23. Parts: Atomic insert M N N+1 Part on disk Primary key Part on disk Insert number INSERT 20 / 84
  • 24. Parts: Atomic insert M N N+1 Part on disk Primary key Part on disk Insert number INSERT 21 / 84
  • 25. Parts: Atomic insert M N N+1 Part on disk Primary key Part on disk Insert number 22 / 84
  • 26. Read
  • 27. Parts: Data $ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1 GoalNum.bin GoalNum.mrk BannerID.bin ... primary.idx checksums.txt count.txt columns.txt partition.dat minmax_EventDate.idx 24 / 84
  • 28. Parts: Data $ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1 GoalNum.bin GoalNum.mrk BannerID.bin ... primary.idx checksums.txt count.txt columns.txt partition.dat minmax_EventDate.idx Contents: › primary.idx – primary key on disk 25 / 84
  • 29. Parts: Data $ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1 GoalNum.bin GoalNum.mrk BannerID.bin ... primary.idx checksums.txt count.txt columns.txt partition.dat minmax_EventDate.idx Contents: › primary.idx – primary key on disk › GoalNum.bin – compressed column › GoalNum.mrk – marks for column 26 / 84
  • 30. Parts: Data $ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1 GoalNum.bin GoalNum.mrk BannerID.bin ... primary.idx checksums.txt count.txt columns.txt partition.dat minmax_EventDate.idx Contents: › primary.idx – primary key on disk › GoalNum.bin – compressed column › GoalNum.mrk – marks for column › partition.dat – partition id 27 / 84
  • 31. Parts: Data $ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1 GoalNum.bin GoalNum.mrk BannerID.bin ... primary.idx checksums.txt count.txt columns.txt partition.dat minmax_EventDate.idx Contents: › primary.idx – primary key on disk › GoalNum.bin – compressed column › GoalNum.mrk – marks for column › partition.dat – partition id › ... – a lot of other useful files 28 / 84
  • 32. Index › Row-oriented › Sparse (each 8192 row) › Stored in memory › Uncompressed 0 10000 8192 16384 18192 1998848 2008848 primary.idx OrderID BannerID 26384 0. 1. 2. N. 29 / 84
  • 33. Columns › Each column in separate file › Compressed by blocks › Checksums for each block OrderID.bin 0 65795 131614 197433 Uncompressed block 14324 17214 17215 17216 17217 Checksums 30 / 84
  • 34. How to Use Index? Problem: › Index is sparse and contain rows › Columns contain compressed blocks How to match index with columns? 31 / 84
  • 35. Solution: Marks › Mark – offset in compressed file and uncompressed block › Stored in column_name.mrk files › One for each index row 32 / 84
  • 36. Solution: Marks › Mark – offset in compressed file and uncompressed block › Stored in column_name.mrk files › One for each index row 65795 0 OrderID.mrk OrderID.bin 0 65795 131614 197433 0 32768 Uncompressed block 8192 65795 32768 33 / 84
  • 37. Put it all together Algorithm: › Determine required index rows › Found corresponding marks › Distribute granules (stripe of marks) among threads › Read required granules Properties: › Read granules concurrently › Threads can steal tasks 34 / 84
  • 38. Read Data from Disk 0 10000 8192 16384 18192 26384 1998848 2008848 EventDate OrderID GoalNum 2b 4b 1b primary.idx KeyCondition .mrk .bin .mrk .bin .mrk .bin thread 1 thread 2 OrderID BannerID SELECT any(EventDate), max(GoalNum) FROM mt WHERE OrderID BETWEEN 6123 AND 17345 35 / 84
  • 39. Read Data from Disk 0 10000 8192 16384 18192 26384 1998848 2008848 EventDate OrderID GoalNum 2b 4b 1b primary.idx KeyCondition .mrk .bin .mrk .bin .mrk .bin thread 1 thread 2 OrderID BannerID SELECT any(EventDate), max(GoalNum) FROM mt WHERE OrderID BETWEEN 6123 AND 17345 36 / 84
  • 40. Read Data from Disk 0 10000 8192 16384 18192 26384 1998848 2008848 EventDate OrderID GoalNum 2b 4b 1b primary.idx KeyCondition .mrk .bin .mrk .bin .mrk .bin thread 1 thread 2 OrderID BannerID SELECT any(EventDate), max(GoalNum) FROM mt WHERE OrderID BETWEEN 6123 AND 17345 37 / 84
  • 41. Read Data from Disk 0 10000 8192 16384 18192 26384 1998848 2008848 EventDate OrderID GoalNum 2b 4b 1b primary.idx KeyCondition .mrk .bin .mrk .bin .mrk .bin thread 1 thread 2 OrderID BannerID SELECT any(EventDate), max(GoalNum) FROM mt WHERE OrderID BETWEEN 6123 AND 17345 38 / 84
  • 42. Read Data from Disk 0 10000 8192 16384 18192 26384 1998848 2008848 EventDate OrderID GoalNum 2b 4b 1b primary.idx KeyCondition .mrk .bin .mrk .bin .mrk .bin thread 1 thread 2 OrderID BannerID SELECT any(EventDate), max(GoalNum) FROM mt WHERE OrderID BETWEEN 6123 AND 17345 39 / 84
  • 43. Read Data from Disk 0 10000 8192 16384 18192 26384 1998848 2008848 EventDate OrderID GoalNum 2b 4b 1b primary.idx KeyCondition .mrk .bin .mrk .bin .mrk .bin thread 1 thread 2 OrderID BannerID SELECT any(EventDate), max(GoalNum) FROM mt WHERE OrderID BETWEEN 6123 AND 17345 40 / 84
  • 44. Read Data from Disk 0 10000 8192 16384 18192 26384 1998848 2008848 EventDate OrderID GoalNum 2b 4b 1b primary.idx KeyCondition .mrk .bin .mrk .bin .mrk .bin thread 1 thread 2 OrderID BannerID SELECT any(EventDate), max(GoalNum) FROM mt WHERE OrderID BETWEEN 6123 AND 17345 41 / 84
  • 45. Read Data from Disk 0 10000 8192 16384 18192 26384 1998848 2008848 EventDate OrderID GoalNum 2b 4b 1b primary.idx KeyCondition .mrk .bin .mrk .bin .mrk .bin thread 1 thread 2 OrderID BannerID SELECT any(EventDate), max(GoalNum) FROM mt WHERE OrderID BETWEEN 6123 AND 17345 42 / 84
  • 46. Read Data from Disk 0 10000 8192 16384 18192 26384 1998848 2008848 EventDate OrderID GoalNum 2b 4b 1b primary.idx KeyCondition .mrk .bin .mrk .bin .mrk .bin thread 1 thread 2 OrderID BannerID SELECT any(EventDate), max(GoalNum) FROM mt WHERE OrderID BETWEEN 6123 AND 17345 43 / 84
  • 47. Read Data from Disk 0 10000 8192 16384 18192 26384 1998848 2008848 EventDate OrderID GoalNum 2b 4b 1b primary.idx KeyCondition .mrk .bin .mrk .bin .mrk .bin thread 1 thread 2 OrderID BannerID SELECT any(EventDate), max(GoalNum) FROM mt WHERE OrderID BETWEEN 6123 AND 17345 44 / 84
  • 48. Read Data from Disk 0 10000 8192 16384 18192 26384 1998848 2008848 EventDate OrderID GoalNum 2b 4b 1b primary.idx KeyCondition .mrk .bin .mrk .bin .mrk .bin thread 1 thread 2 OrderID BannerID SELECT any(EventDate), max(GoalNum) FROM mt WHERE OrderID BETWEEN 6123 AND 17345 45 / 84
  • 50. Problem: Amount Files in Parts Test example: Almost OK $ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1 | wc -l 14 47 / 84
  • 51. Problem: Amount Files in Parts Test example: Almost OK $ ls /var/lib/clickhouse/data/default/mt/201810_1_4_1 | wc -l 14 Production: Too much $ ssh -A root@mtgiga075-2.metrika.yandex.net # ls /var/lib/clickhouse/data/merge/visits_v2/202002_4462_4462_0 | wc -l 1556 48 / 84
  • 52. Solution: Merges M N N+1 Primary key Insert number Part [M,N] Part [N+1] 49 / 84
  • 53. Solution: Merges M N N+1 Background merge [M,N] [N+1] Primary key Part Part Insert number 50 / 84
  • 54. Solution: Merges M N N+1 [M,N+1] Primary key Insert number Part 51 / 84
  • 55. Properties of Merge › Each part participate in a single successful merge › Source parts became inactive › Addional logic during merge 52 / 84
  • 56. Things to do while merging Replace/update records › ReplacingMergeTree – replace › SummingMergeTree – sum › CollapsingMergeTree – fold › VersionedCollapsingMergeTree – fold rows + versioning Pre-aggregate data › AggregatingMergeTree – merge aggregate function states Metrics rollup › GraphiteMergeTree – rollup in graphite fashion 53 / 84
  • 58. Partitioning ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY ... 55 / 84
  • 59. Partitioning ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY ... › Logical entities (doesn’t stored on disk) › Table can be partitioned by any expression (default: by month) › Parts from different partitions never merged › MinMax index by partition columns › Easy manipulation of partitions: 56 / 84
  • 60. Partitioning ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY ... › Logical entities (doesn’t stored on disk) › Table can be partitioned by any expression (default: by month) › Parts from different partitions never merged › MinMax index by partition columns › Easy manipulation of partitions: ALTER TABLE mt DROP PARTITION 201810 ALTER TABLE mt DETACH/ATTACH PARTITION 201809 57 / 84
  • 61. Mutations ALTER TABLE mt DELETE WHERE OrderID < 1205 ALTER TABLE mt UPDATE GoalNum = 3 WHERE BannerID = 235433; Features › NOT designed for regular usage › Overwrite all touched parts on disk › Work in background › Original parts became inactive 58 / 84
  • 63. Mutations M N N+1 Primary key Insert number Mutation 60 / 84
  • 64. Mutations M N N+1 Primary key Insert number Mutation 61 / 84
  • 65. Mutations M N N+1 Primary key Insert number Mutation 62 / 84
  • 78. Summarize: MergeTree Consists of Block Column Part Partition Table 75 / 84
  • 79. Summarize: MergeTree Consists of Block Column Part Partition Table 76 / 84
  • 80. Summarize: MergeTree Consists of Block Column Part Partition Table 77 / 84
  • 81. Summarize: MergeTree Consists of Block Column Part Partition Table 78 / 84
  • 82. Summarize: MergeTree Consists of Block Column Part Partition Table 79 / 84
  • 83. Things to remember Control total number of parts › Rate of INSERTs 80 / 84
  • 84. Things to remember Control total number of parts › Rate of INSERTs Merging runs in the background › Even when there are no queries! › With additional fold logic 81 / 84
  • 85. Things to remember Control total number of parts › Rate of INSERTs Merging runs in the background › Even when there are no queries! › With additional fold logic Index is sparse › Must fit into memory › Determines order of data on disk › Using the index is always beneficial 82 / 84
  • 86. Things to remember Control total number of parts › Rate of INSERTs Merging runs in the background › Even when there are no queries! › With additional fold logic Index is sparse › Must fit into memory › Determines order of data on disk › Using the index is always beneficial Partitions is logical entity › Easy manipulation with portions of data › Cannot improve SELECTs performance 83 / 84