SlideShare a Scribd company logo
1 of 52
Download to read offline
ClickHouse Keeper
Alexander Sapin, Software Engineer
Table of Contents
Consensus in General
ZooKeeper in ClickHouse
ClickHouse Keeper
TL;DR
ClickHouse
› Fast analytical DBMS
› Open source project
› Reliable distributed storage
Replication in ClickHouse
› Leader-leader eventually consistent
› Uses ZooKeeper for consensus
ZooKeeper has a lot of cons
› Written in Java
› Require additional servers
› ...
ClickHouse Keeper – replacement for ZooKeeper
1 / 49
Consensus Problem
In modern world applications are distributed
› Multiple independed servers (processes) involved
› Communication via network
› Failures may happen: network errors, processes failures
› No reliable clocks exists (in fact not true)
Sometimes agreement on some data required
› Many cases: leader election, load balancing, distributed locking
Required properties for solution
› Termination – every alive processes agrees some value;
› Integrity – if all the alive processes propose value v, then any alive process
must agree on v;
› Agreement – every alive process must agree on the same value.
2 / 49
Consensus in Real World: State Machine
Agreement on a single value is not enough
› Consensus algorithms work on state machines
› State machines have some state and operations (transitions)
› Often operations stored in log and applied to state machine
Example: distributed hash table
› State machine – in-memory hash table
› Operations: set(key, value), get(key), erase(key)
› Log: set(’x’, 5), set(’y’, 10), erase(’x’)
What about order of operations?
x
5
x
5
y
10
y
10
set('x', 5) set('y', 10) erase('x')
y
10
y
10
y
10
set('y', 10) erase('x') set('x', 5)
x
5
≠
3 / 49
Consensus in Real World: Consistency Models
4 / 49
Consensus in Real World: Consistency Models
5 / 49
Consensus in Real World: Example
State machine: bank account
Operations: deposit(x), withdraw(x)
Log:
Client1 Client2
S2 S3
S4
S5
S1
0
6 / 49
Consensus in Real World: Example
State machine: bank account
Operations: deposit(x), withdraw(x)
Log:
Client1 Client2
S2 S3
S4
S5
S1
Deposit 100
0
7 / 49
Consensus in Real World: Example
State machine: bank account
Operations: deposit(x), withdraw(x)
Log:
Client1 Client2
S2 S3
S4
S5
S1
Deposit 100
0
8 / 49
Consensus in Real World: Example
State machine: bank account
Operations: deposit(x), withdraw(x)
Log: d(100)
Client1 Client2
S2 S3
S4
S5
S1
Deposit 100
0
9 / 49
Consensus in Real World: Example
State machine: bank account
Operations: deposit(x), withdraw(x)
Log: d(100)
Client1 Client2
S2 S3
S4
S5
S1
OK
100
10 / 49
Consensus in Real World: Example
State machine: bank account
Operations: deposit(x), withdraw(x)
Log: d(100)
Client1 Client2
S2 S3
S4
S5
S1
Withdraw 50
100
Withdraw 75
11 / 49
Consensus in Real World: Example
State machine: bank account
Operations: deposit(x), withdraw(x)
Log: d(100)
Client1 Client2
S2 S3
S4
S5
S1
Withdraw 50
100
Withdraw 75
12 / 49
Consensus in Real World: Example
State machine: bank account
Operations: deposit(x), withdraw(x)
Log: d(100), w(75), w(50)
Client1 Client2
S2 S3
S4
S5
S1
Withdraw 50
100
Withdraw 75
13 / 49
Consensus in Real World: Example
State machine: bank account
Operations: deposit(x), withdraw(x)
Log: d(100), w(75), w(50)
Client1 Client2
S2 S3
S4
S5
S1
Withdraw 50
25
OK
14 / 49
Consensus in Real World: Example
State machine: bank account
Operations: deposit(x), withdraw(x)
Log: d(100), w(75), w(50)
Client1 Client2
S2 S3
S4
S5
S1
NOT OK
25
OK
15 / 49
Consensus in Real World: Algos and Apps
Some consensus algorithms (allow linearizeability)
› Paxos, MultiPaxos (around 2000)
› ZAB (2011)
› Raft (2014)
Solves consensus problem:
› Coordination: Chubby, ZooKeeper, etcd, Consul
› KV Storage: DynamoDB, Cassandra, Riak
› Distributed Databases: Spanner, CockroachDB
› Stream Processing: Kafka, Millwheel
› Resource Management: Kubernetes, Mesos
16 / 49
Why ClickHouse needs Consensus?
Replicated Merge Tree
› Leader-leader eventually-consistent replication
› Distributed state machine with own log
› Consensus: block numbers allocation, merges assignment
Distributed DDL queries (ON CLUSTER)
› Distributed state machine with own log
› Sequential queries execution for each shard
› Consensus: order of queries, executor choice
Main properties
› Small amount of data
› Linearizeability for writes
› Highly available
17 / 49
ZooKeeper in ClickHouse
ClickHouse use ZooKeeper for
› Replicated merge tree metadata
› DDL queries log storage
› Distributed notification system
Why ZooKeeper?
› Historical reasons
› Simple and powerful API
› MultiTransactions
› Watches
› Good performance for reads
18 / 49
ZooKeeper Coordination System
State Machine (Data Model)
› Filesystem-like distributed hash-table
› Each node can have both data and children
› Nodes have stats (version of data, of children, ...)
› No data types, everything is string
Client API
› Own TCP full-duplex protocol
› Persistent session for each client (unique session_id)
Main operations
› Read: get(node), list(node), exists(node), check(node, version)
› Write: set(node, value), create(node), remove(node)
› Writes can be combined into MultiTransactions
19 / 49
ZooKeeper Features
State Machine features
› Ephemeral nodes – disappear with session disconnect
› Sequential nodes – unique names with ten digits number
› Node can be both sequential and ephemeral
Client API features
› Watches – server-side one-time triggers
› Session restore – client can reconnect with the same session_id
Pluggable ACL + authentication system
› The most strange implementation I’ve ever seen
20 / 49
ZooKeeper Internals
Consistency Guarantees
› Linearizeability for write operations
› Sequential consistency for reads (reads are local)
› Atomicity of MultiTransactions
› No rollbacks of commited writes
Consensus Implementation
› Own algorithm: ZooKeeper Atomic Broadcast
› Operations are idempotent and stored in log files on filesystem
› Regular state machine snapshots
Scalability
› Linear for reads (more servers, faster reads)
› Inverse linear for writes (more servers, slower writes)
21 / 49
ZooKeeper Pros and Cons for ClickHouse
Pros:
› Battle tested consensus
› Appropriate data model
› Simple protocol
› Has own client implementation
Cons:
› Writen in Java
› Difficult to operate
› Require separate servers
› ZXID overflow
› Uncompressed logs and
snapshots
› Checksums are optional
› Unreliable reconnect semantics
› Develops slowly
22 / 49
ClickHouse Keeper
Replacement for ZooKeeper
› Compatible client protocol (all clients work out of the box)
› The same state machine (data model)
› Better guarantees (optionally allows linearizeable reads)
› Comparable performance (better for reads, similar for writes)
Implementation
› Written in C++, bundled into clickhouse-server package
› Uses Raft algorithm (NuRaft implementation)
› Can be embedded into ClickHouse server
› Optional TLS for clients and internal communication
Advantages over ZooKeeper
› Checksums in logs, snapshots and internal protocol
› Compressed snapshots and logs
23 / 49
ClickHouse Keeper: In Action
S1
S2
S3
Client
ClickHouse Keeper Cluster
Raft
24 / 49
ClickHouse Keeper: In Action
L
F
F
Client
ClickHouse Keeper Cluster
Raft
HB
HB
25 / 49
ClickHouse Keeper: In Action
L
F
F
Client
ClickHouse Keeper Cluster
Raft
Raft Leader
Raft follower
26 / 49
ClickHouse Keeper: In Action
L
F
F
Client
ClickHouse Keeper Cluster
Raft
Raft Leader
Raft follower
Any
ZooKeeper
Client
27 / 49
ClickHouse Keeper: In Action
L
F
F
Client
ClickHouse Keeper Cluster
Raft
All requests goes
to leader
28 / 49
ClickHouse Keeper: In Action
L
F
F
Client
ClickHouse Keeper Cluster
Raft
Requests
forwarding
29 / 49
ClickHouse Keeper: In Action
L
F
F
Client
ClickHouse Keeper Cluster
Raft
create("/n1", "hello")
30 / 49
ClickHouse Keeper: In Action
L
F
F
Client
ClickHouse Keeper Cluster
Raft
create("/n1", "hello")
31 / 49
ClickHouse Keeper: In Action
L
F
F
Client
ClickHouse Keeper Cluster
Raft
append log
request
append log
request
append log
32 / 49
ClickHouse Keeper: In Action
L
F
F
Client
ClickHouse Keeper Cluster
Raft
append log
response: Ok
33 / 49
ClickHouse Keeper: In Action
L
F
F
Client
ClickHouse Keeper Cluster
Raft
Committed
Committed
Commit
34 / 49
ClickHouse Keeper: In Action
L
F
F
Client
ClickHouse Keeper Cluster
Raft
Commit
Commit
35 / 49
ClickHouse Keeper: In Action
L
F
F
Client
ClickHouse Keeper Cluster
Raft
Ok
36 / 49
ClickHouse Keeper: In Action
L
F
F
Client
ClickHouse Keeper Cluster
Raft
get("/n1")
37 / 49
ClickHouse Keeper: In Action
L
F
F
Client
ClickHouse Keeper Cluster
Raft
Ok: "hello"
38 / 49
ClickHouse Keeper: Testing
Ordinary ClickHouse tests
› Functional tests use clickhouse-keeper in single node mode
› Integration tests use clickhouse-keeper in three nodes mode
› Separate integration tests for basic functionality
Jepsen tests (http://jepsen.io/)
› General framework for distributed systems testing
› Written in Clojure with consistency checks
› Failures: crashes, network partitions, disk corruption, network slow downs
› More tests than for ZooKeeper
› About 5 serious bugs found both in NuRaft and our code
39 / 49
ClickHouse Keeper: How to use?
Two modes
› As standalone application (clickhouse-keeper)
› Inside clickhouse-server
Configuration
› Very similar to clickhouse-server .xml (or .yml) file
› Must be equal for all quorum participants
General recommendations
› Place directory with logs to the independet SSD if possible
› Don’t try to have more than 9 quorum participants
› Don’t change configuration for more than 1 server at once
Run standalone
clickhouse-keeper --daemon
--config /etc/your_path_to_config/config.xml
40 / 49
ClickHouse Keeper: Simpliest Configuration
<keeper_server>
<tcp_port>9181</tcp_port>
<server_id>1</server_id>
<storage_path>/var/lib/clickhouse/coordination/</storage_path>
<coordination_settings>
<force_sync>false</force_sync>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>localhost</hostname>
<port>44444</port>
</server>
</raft_configuration>
</keeper_server>
41 / 49
ClickHouse Keeper: Configuration №1
K1 K2
K3
Standalone Keeper Cluster
R1
R2
R3
R1
R2
R3
R1
R2
R3
Shard 1 Shard 2 Shard N
ClickHouse Cluster 42 / 49
ClickHouse Keeper: Configuration №2
R1
R2
R3
R1
R2
R3
R1
R2
R3
Shard 1 Shard 2 Shard N
ClickHouse Cluster
K1
K2
K3
K1
K2
K3
K1
K2
K3
Keeper for each shard
43 / 49
ClickHouse Keeper: Configuration №3
R1
R2
R3
R1
R2
R3
R1
R2
R3
Shard 1 Shard 2 Shard N
ClickHouse Cluster
K1
K2
K3
One powerful shard with Keeper
44 / 49
ClickHouse Keeper: Configuration №4
R1
R2
R3
R1
R2
R3
R1
R2
R3
Shard 1 Shard 2 Shard N
ClickHouse Cluster
K1
K2
K3
K1
K2
K3
K1
K2
K3
Combination of №2+№3
45 / 49
ClickHouse Keeper: Some Settings
If using slow disk or have big network latency try to increase
› heart_beat_interval_ms – how often leader will send heartbeats
› election_timeout_lower_bound_ms – how long followers will wait for HB
› election_timeout_upper_bound_ms – how long followers will wait for HB
Quorum priorities in raft_configuration of server
› can_become_leader – server will be observer
› priority – server will become leader more often according to priority
If you need reads linearizeability [experimental]
› quorum_reads – read requests go through Raft
46 / 49
ClickHouse Keeper: Migration from ZooKeeper
Separate tool clickhouse-keeper-converter
› Allows to convert ZooKeeper data to clickhouse-keeper snapshot
› Checked for ZooKeeper 3.4+
› Bundled into clickhouse-server package
How to migrate
› Stop all ZooKeeper nodes
› Found leader for migration
› Start ZooKeeper on leader node and stop again (force snapshot)
› Run clickhouse-keeper-converter:
clickhouse-keeper-converter
--zookeeper-logs-dir /path_to_zookeeper/version-2
--zookeeper-snapshots-dir /path_to_zookeeper/version-2
--output-dir /path/to/clickhouse/keeper/snapshots
› Copy resulted snapshot to all clickhouse-keeper nodes 47 / 49
ClickHouse Keeper: Current Status
Preproduction (available since 21.8)
› Testing in Yandex.Cloud installations
› Testing by Altinity
What to read
› Documentation:
https://clickhouse.tech/docs/en/operations/clickhouse-keeper/
› Integration tests:
https://github.com/ClickHouse/ClickHouse/tree/master/tests/integration
› NuRaft docs:
https://github.com/eBay/NuRaft
Next steps
› Four-letter words introspection interface
› Compressed logs
› Elastic quorum configuration 48 / 49
Thank you
QA
49 / 49

More Related Content

What's hot

What's hot (20)

ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
ClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic Continues
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and how
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareClickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
 
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and CloudAltinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
 
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
 
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
 
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster PerformanceWebinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
 

Similar to ClickHouse Keeper

Data Integrity in Java Applications (JFall 2013)
Data Integrity in Java Applications (JFall 2013)Data Integrity in Java Applications (JFall 2013)
Data Integrity in Java Applications (JFall 2013)
Lucas Jellema
 
Mininet: Moving Forward
Mininet: Moving ForwardMininet: Moving Forward
Mininet: Moving Forward
ON.Lab
 

Similar to ClickHouse Keeper (20)

Into The Box 2018 Ortus Keynote
Into The Box 2018 Ortus KeynoteInto The Box 2018 Ortus Keynote
Into The Box 2018 Ortus Keynote
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
RAFT Consensus Algorithm
RAFT Consensus AlgorithmRAFT Consensus Algorithm
RAFT Consensus Algorithm
 
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsGo Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
Java On CRaC
Java On CRaCJava On CRaC
Java On CRaC
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Lattice yapc-slideshare
Lattice yapc-slideshareLattice yapc-slideshare
Lattice yapc-slideshare
 
Introducing OVHcloud Enterprise Cloud Databases
Introducing OVHcloud Enterprise Cloud DatabasesIntroducing OVHcloud Enterprise Cloud Databases
Introducing OVHcloud Enterprise Cloud Databases
 
Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern Databases
 
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
Navigating Transactions: ACID Complexity in Modern Databases- Mydbops Open So...
 
J-Fall2013- Lucas Jellema: Integrity in Java apps handouts
J-Fall2013- Lucas Jellema: Integrity in Java apps handoutsJ-Fall2013- Lucas Jellema: Integrity in Java apps handouts
J-Fall2013- Lucas Jellema: Integrity in Java apps handouts
 
Data Integrity in Java Applications (JFall 2013)
Data Integrity in Java Applications (JFall 2013)Data Integrity in Java Applications (JFall 2013)
Data Integrity in Java Applications (JFall 2013)
 
FreeSWITCH as a Microservice
FreeSWITCH as a MicroserviceFreeSWITCH as a Microservice
FreeSWITCH as a Microservice
 
System Revolution- How We Did It
System Revolution- How We Did It System Revolution- How We Did It
System Revolution- How We Did It
 
System revolution how we did it
System revolution   how we did itSystem revolution   how we did it
System revolution how we did it
 
Mininet: Moving Forward
Mininet: Moving ForwardMininet: Moving Forward
Mininet: Moving Forward
 
Showdown: IBM DB2 versus Oracle Database for OLTP
Showdown: IBM DB2 versus Oracle Database for OLTPShowdown: IBM DB2 versus Oracle Database for OLTP
Showdown: IBM DB2 versus Oracle Database for OLTP
 
Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...
Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...
Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...
 
Perforce Server: The Next Generation
Perforce Server: The Next GenerationPerforce Server: The Next Generation
Perforce Server: The Next Generation
 

More from Altinity Ltd

More from Altinity Ltd (20)

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 

ClickHouse Keeper

  • 1.
  • 3. Table of Contents Consensus in General ZooKeeper in ClickHouse ClickHouse Keeper
  • 4. TL;DR ClickHouse › Fast analytical DBMS › Open source project › Reliable distributed storage Replication in ClickHouse › Leader-leader eventually consistent › Uses ZooKeeper for consensus ZooKeeper has a lot of cons › Written in Java › Require additional servers › ... ClickHouse Keeper – replacement for ZooKeeper 1 / 49
  • 5. Consensus Problem In modern world applications are distributed › Multiple independed servers (processes) involved › Communication via network › Failures may happen: network errors, processes failures › No reliable clocks exists (in fact not true) Sometimes agreement on some data required › Many cases: leader election, load balancing, distributed locking Required properties for solution › Termination – every alive processes agrees some value; › Integrity – if all the alive processes propose value v, then any alive process must agree on v; › Agreement – every alive process must agree on the same value. 2 / 49
  • 6. Consensus in Real World: State Machine Agreement on a single value is not enough › Consensus algorithms work on state machines › State machines have some state and operations (transitions) › Often operations stored in log and applied to state machine Example: distributed hash table › State machine – in-memory hash table › Operations: set(key, value), get(key), erase(key) › Log: set(’x’, 5), set(’y’, 10), erase(’x’) What about order of operations? x 5 x 5 y 10 y 10 set('x', 5) set('y', 10) erase('x') y 10 y 10 y 10 set('y', 10) erase('x') set('x', 5) x 5 ≠ 3 / 49
  • 7. Consensus in Real World: Consistency Models 4 / 49
  • 8. Consensus in Real World: Consistency Models 5 / 49
  • 9. Consensus in Real World: Example State machine: bank account Operations: deposit(x), withdraw(x) Log: Client1 Client2 S2 S3 S4 S5 S1 0 6 / 49
  • 10. Consensus in Real World: Example State machine: bank account Operations: deposit(x), withdraw(x) Log: Client1 Client2 S2 S3 S4 S5 S1 Deposit 100 0 7 / 49
  • 11. Consensus in Real World: Example State machine: bank account Operations: deposit(x), withdraw(x) Log: Client1 Client2 S2 S3 S4 S5 S1 Deposit 100 0 8 / 49
  • 12. Consensus in Real World: Example State machine: bank account Operations: deposit(x), withdraw(x) Log: d(100) Client1 Client2 S2 S3 S4 S5 S1 Deposit 100 0 9 / 49
  • 13. Consensus in Real World: Example State machine: bank account Operations: deposit(x), withdraw(x) Log: d(100) Client1 Client2 S2 S3 S4 S5 S1 OK 100 10 / 49
  • 14. Consensus in Real World: Example State machine: bank account Operations: deposit(x), withdraw(x) Log: d(100) Client1 Client2 S2 S3 S4 S5 S1 Withdraw 50 100 Withdraw 75 11 / 49
  • 15. Consensus in Real World: Example State machine: bank account Operations: deposit(x), withdraw(x) Log: d(100) Client1 Client2 S2 S3 S4 S5 S1 Withdraw 50 100 Withdraw 75 12 / 49
  • 16. Consensus in Real World: Example State machine: bank account Operations: deposit(x), withdraw(x) Log: d(100), w(75), w(50) Client1 Client2 S2 S3 S4 S5 S1 Withdraw 50 100 Withdraw 75 13 / 49
  • 17. Consensus in Real World: Example State machine: bank account Operations: deposit(x), withdraw(x) Log: d(100), w(75), w(50) Client1 Client2 S2 S3 S4 S5 S1 Withdraw 50 25 OK 14 / 49
  • 18. Consensus in Real World: Example State machine: bank account Operations: deposit(x), withdraw(x) Log: d(100), w(75), w(50) Client1 Client2 S2 S3 S4 S5 S1 NOT OK 25 OK 15 / 49
  • 19. Consensus in Real World: Algos and Apps Some consensus algorithms (allow linearizeability) › Paxos, MultiPaxos (around 2000) › ZAB (2011) › Raft (2014) Solves consensus problem: › Coordination: Chubby, ZooKeeper, etcd, Consul › KV Storage: DynamoDB, Cassandra, Riak › Distributed Databases: Spanner, CockroachDB › Stream Processing: Kafka, Millwheel › Resource Management: Kubernetes, Mesos 16 / 49
  • 20. Why ClickHouse needs Consensus? Replicated Merge Tree › Leader-leader eventually-consistent replication › Distributed state machine with own log › Consensus: block numbers allocation, merges assignment Distributed DDL queries (ON CLUSTER) › Distributed state machine with own log › Sequential queries execution for each shard › Consensus: order of queries, executor choice Main properties › Small amount of data › Linearizeability for writes › Highly available 17 / 49
  • 21. ZooKeeper in ClickHouse ClickHouse use ZooKeeper for › Replicated merge tree metadata › DDL queries log storage › Distributed notification system Why ZooKeeper? › Historical reasons › Simple and powerful API › MultiTransactions › Watches › Good performance for reads 18 / 49
  • 22. ZooKeeper Coordination System State Machine (Data Model) › Filesystem-like distributed hash-table › Each node can have both data and children › Nodes have stats (version of data, of children, ...) › No data types, everything is string Client API › Own TCP full-duplex protocol › Persistent session for each client (unique session_id) Main operations › Read: get(node), list(node), exists(node), check(node, version) › Write: set(node, value), create(node), remove(node) › Writes can be combined into MultiTransactions 19 / 49
  • 23. ZooKeeper Features State Machine features › Ephemeral nodes – disappear with session disconnect › Sequential nodes – unique names with ten digits number › Node can be both sequential and ephemeral Client API features › Watches – server-side one-time triggers › Session restore – client can reconnect with the same session_id Pluggable ACL + authentication system › The most strange implementation I’ve ever seen 20 / 49
  • 24. ZooKeeper Internals Consistency Guarantees › Linearizeability for write operations › Sequential consistency for reads (reads are local) › Atomicity of MultiTransactions › No rollbacks of commited writes Consensus Implementation › Own algorithm: ZooKeeper Atomic Broadcast › Operations are idempotent and stored in log files on filesystem › Regular state machine snapshots Scalability › Linear for reads (more servers, faster reads) › Inverse linear for writes (more servers, slower writes) 21 / 49
  • 25. ZooKeeper Pros and Cons for ClickHouse Pros: › Battle tested consensus › Appropriate data model › Simple protocol › Has own client implementation Cons: › Writen in Java › Difficult to operate › Require separate servers › ZXID overflow › Uncompressed logs and snapshots › Checksums are optional › Unreliable reconnect semantics › Develops slowly 22 / 49
  • 26. ClickHouse Keeper Replacement for ZooKeeper › Compatible client protocol (all clients work out of the box) › The same state machine (data model) › Better guarantees (optionally allows linearizeable reads) › Comparable performance (better for reads, similar for writes) Implementation › Written in C++, bundled into clickhouse-server package › Uses Raft algorithm (NuRaft implementation) › Can be embedded into ClickHouse server › Optional TLS for clients and internal communication Advantages over ZooKeeper › Checksums in logs, snapshots and internal protocol › Compressed snapshots and logs 23 / 49
  • 27. ClickHouse Keeper: In Action S1 S2 S3 Client ClickHouse Keeper Cluster Raft 24 / 49
  • 28. ClickHouse Keeper: In Action L F F Client ClickHouse Keeper Cluster Raft HB HB 25 / 49
  • 29. ClickHouse Keeper: In Action L F F Client ClickHouse Keeper Cluster Raft Raft Leader Raft follower 26 / 49
  • 30. ClickHouse Keeper: In Action L F F Client ClickHouse Keeper Cluster Raft Raft Leader Raft follower Any ZooKeeper Client 27 / 49
  • 31. ClickHouse Keeper: In Action L F F Client ClickHouse Keeper Cluster Raft All requests goes to leader 28 / 49
  • 32. ClickHouse Keeper: In Action L F F Client ClickHouse Keeper Cluster Raft Requests forwarding 29 / 49
  • 33. ClickHouse Keeper: In Action L F F Client ClickHouse Keeper Cluster Raft create("/n1", "hello") 30 / 49
  • 34. ClickHouse Keeper: In Action L F F Client ClickHouse Keeper Cluster Raft create("/n1", "hello") 31 / 49
  • 35. ClickHouse Keeper: In Action L F F Client ClickHouse Keeper Cluster Raft append log request append log request append log 32 / 49
  • 36. ClickHouse Keeper: In Action L F F Client ClickHouse Keeper Cluster Raft append log response: Ok 33 / 49
  • 37. ClickHouse Keeper: In Action L F F Client ClickHouse Keeper Cluster Raft Committed Committed Commit 34 / 49
  • 38. ClickHouse Keeper: In Action L F F Client ClickHouse Keeper Cluster Raft Commit Commit 35 / 49
  • 39. ClickHouse Keeper: In Action L F F Client ClickHouse Keeper Cluster Raft Ok 36 / 49
  • 40. ClickHouse Keeper: In Action L F F Client ClickHouse Keeper Cluster Raft get("/n1") 37 / 49
  • 41. ClickHouse Keeper: In Action L F F Client ClickHouse Keeper Cluster Raft Ok: "hello" 38 / 49
  • 42. ClickHouse Keeper: Testing Ordinary ClickHouse tests › Functional tests use clickhouse-keeper in single node mode › Integration tests use clickhouse-keeper in three nodes mode › Separate integration tests for basic functionality Jepsen tests (http://jepsen.io/) › General framework for distributed systems testing › Written in Clojure with consistency checks › Failures: crashes, network partitions, disk corruption, network slow downs › More tests than for ZooKeeper › About 5 serious bugs found both in NuRaft and our code 39 / 49
  • 43. ClickHouse Keeper: How to use? Two modes › As standalone application (clickhouse-keeper) › Inside clickhouse-server Configuration › Very similar to clickhouse-server .xml (or .yml) file › Must be equal for all quorum participants General recommendations › Place directory with logs to the independet SSD if possible › Don’t try to have more than 9 quorum participants › Don’t change configuration for more than 1 server at once Run standalone clickhouse-keeper --daemon --config /etc/your_path_to_config/config.xml 40 / 49
  • 44. ClickHouse Keeper: Simpliest Configuration <keeper_server> <tcp_port>9181</tcp_port> <server_id>1</server_id> <storage_path>/var/lib/clickhouse/coordination/</storage_path> <coordination_settings> <force_sync>false</force_sync> </coordination_settings> <raft_configuration> <server> <id>1</id> <hostname>localhost</hostname> <port>44444</port> </server> </raft_configuration> </keeper_server> 41 / 49
  • 45. ClickHouse Keeper: Configuration №1 K1 K2 K3 Standalone Keeper Cluster R1 R2 R3 R1 R2 R3 R1 R2 R3 Shard 1 Shard 2 Shard N ClickHouse Cluster 42 / 49
  • 46. ClickHouse Keeper: Configuration №2 R1 R2 R3 R1 R2 R3 R1 R2 R3 Shard 1 Shard 2 Shard N ClickHouse Cluster K1 K2 K3 K1 K2 K3 K1 K2 K3 Keeper for each shard 43 / 49
  • 47. ClickHouse Keeper: Configuration №3 R1 R2 R3 R1 R2 R3 R1 R2 R3 Shard 1 Shard 2 Shard N ClickHouse Cluster K1 K2 K3 One powerful shard with Keeper 44 / 49
  • 48. ClickHouse Keeper: Configuration №4 R1 R2 R3 R1 R2 R3 R1 R2 R3 Shard 1 Shard 2 Shard N ClickHouse Cluster K1 K2 K3 K1 K2 K3 K1 K2 K3 Combination of №2+№3 45 / 49
  • 49. ClickHouse Keeper: Some Settings If using slow disk or have big network latency try to increase › heart_beat_interval_ms – how often leader will send heartbeats › election_timeout_lower_bound_ms – how long followers will wait for HB › election_timeout_upper_bound_ms – how long followers will wait for HB Quorum priorities in raft_configuration of server › can_become_leader – server will be observer › priority – server will become leader more often according to priority If you need reads linearizeability [experimental] › quorum_reads – read requests go through Raft 46 / 49
  • 50. ClickHouse Keeper: Migration from ZooKeeper Separate tool clickhouse-keeper-converter › Allows to convert ZooKeeper data to clickhouse-keeper snapshot › Checked for ZooKeeper 3.4+ › Bundled into clickhouse-server package How to migrate › Stop all ZooKeeper nodes › Found leader for migration › Start ZooKeeper on leader node and stop again (force snapshot) › Run clickhouse-keeper-converter: clickhouse-keeper-converter --zookeeper-logs-dir /path_to_zookeeper/version-2 --zookeeper-snapshots-dir /path_to_zookeeper/version-2 --output-dir /path/to/clickhouse/keeper/snapshots › Copy resulted snapshot to all clickhouse-keeper nodes 47 / 49
  • 51. ClickHouse Keeper: Current Status Preproduction (available since 21.8) › Testing in Yandex.Cloud installations › Testing by Altinity What to read › Documentation: https://clickhouse.tech/docs/en/operations/clickhouse-keeper/ › Integration tests: https://github.com/ClickHouse/ClickHouse/tree/master/tests/integration › NuRaft docs: https://github.com/eBay/NuRaft Next steps › Four-letter words introspection interface › Compressed logs › Elastic quorum configuration 48 / 49