SlideShare a Scribd company logo
1 of 45
Download to read offline
Real-Time, Exactly-Once Data
Ingestion from Kafka to ClickHouse
Mohammad Roohitavaf, Jun Li
October 21, 2021
The Real-Time Analytics Processing Pipeline
ClickHouse as Real-Time Analytics Database
• ClickHouse: an open-source columnar database
to support OLAP
• Data insertion favors large blocks over individual
rows
• Kafka serves as data buffering
• A Block Aggregator is a data loader to aggregate
Kafka messages into large blocks before loading to
ClickHouse
Block Aggregator Failures
• With respect to block aggregator
• Kafka can fail
• Database backend can fail
• Network connections to Kafka and database can fail
• Block aggregator itself can crash
• Blindly retries on loading data will lead to data loss or data duplication to data
persisted in database
• Kafka transaction mechanism can not be applied here
Our Solution: Exactly-Once Message Delivery to ClickHouse
• To have aggregator to deterministically produce identical blocks to ClickHouse
• With existing runtime supports:
• Kafka metadata store to keep track of execution state, and
• ClickHouse’s block duplication detection
The Outline of the Talk
• The block aggregator developed for multi-DC deployment
• The deterministic message replay protocol in block aggregator
• The runtime verifier as a monitoring/debugging tool for block aggregator
• Issues and experiences in block aggregator’s implementation and deployment
• The block aggregator deployment in production
The Multi-DC Kafka/ClickHouse Deployment
• Each database shard has its own topic
• #partitions in topic = #replicas in shard
• Block aggregator co-located in each
replica (as two containers in a
Kubernetes pod)
• Block aggregator only inserts data to
local database replica (with ClickHouse
replication protocol to replicate data to
other replicas)
• Each block aggregator subscribes to
both Kafka clusters
The Multi-DC Kafka/ClickHouse Failure Scenario (1)
(Kafka DC Down)
The Multi-DC Kafka/ClickHouse Failure Scenario (2)
(DC Down)
(ClickHouse DC
Down)
• ClickHouse insert-quorum = 2
The Multi-DC Kafka/ClickHouse Failure Scenario (3)
(Kafka DC Down)
(ClickHouse
DC Down)
• ClickHouse insert-quorum = 2
Mappings of Topics, Tables, Rows, Messages
• One topic contains messages associated with multiple
tables in database
• One message contains multiple rows belonging to the
same table
• Each message is an opaque byte-array in Kafka based on
the protobuf-based encoding mechanism
• Block aggregator relies on ClickHouse table schema to
decode Kafka messages
• When a new table is added to database, no need to make
schema changes to Kafka clusters
• The number of topics does not grow as the tables continue
to be added
• Table rows constructed from Kafka messages in two Kafka
DCs get merged in database
The Block Aggregator Architecture
The Key Features of Block Aggregator
• Support multi-datacenter deployment model
• Multiple tables per topic/partition
• No data loss/duplication
• Monitoring with over hundred metrics:
• Message processing rates
• Block insertion rate and failure rate
• Block size distribution
• Block loading time distribution
• Kafka metadata commit time and failure rate
• Whether abnormal message consumption behaviors happened (such as message
offset re-wound or skipped)
The Outline of the Talk
• The block aggregator developed for multi-DC deployment
• The deterministic message replay protocol in block aggregator
• The runtime verifier as a monitoring/debugging tool for block aggregator
• Issues and experiences in block aggregator’s implementation and deployment
• The block aggregator deployment in production
A Naïve Way for Block Aggregator to Replay Messages (1)
A Naïve Way for Block Aggregator to Replay Messages (2)
Our Solution: Block-Level Deduplication in ClickHouse (1)
• ClickHouse relies on ZooKeeper to store metadata
• Each block stored contains a hash value
• New blocks to be inserted need to have hash uniqueness checked
• Blocks are identical if
• Having same block size
• Containing same rows
• And rows in same order
Our Solution: Guarantee to Form Identical Blocks (2)
• Store metadata back to Kafka which describes the latest blocks formed for
each table
• In case of failure, the next Block Aggregator that picks up the partition will
know exactly how to reconstruct the latest blocks formed for each table by
the previous Block Aggregator
• The Block Aggregators can be in two different ClickHouse replicas, if Kafka
partition rebalancing happens
The Metadata Structure
For each Kafka connector, the metadata persisted to Kafka, per partition, is:
replica_1,table1,0,29,20,table2,5,20,10
The last block for table1 decided to load to ClickHouse: [0, 29].
Starting offset min = 0, we have consumed 20 messages for table1.
The last block for table2 decided to load to ClickHouse: [5, 20].
Starting offset min = 0, we have consumed 10 messages for table2.
In total, we have consumed all 30 messages from offset min=0 to offset max=29: 20 for table 1 and 10 for table2.
replica-Id, [table-name, begin-msg-offset, end-msg-offset, count]+
Metadata.min = MIN (begin-msg-offset); Metadata.max = MAX(end-msg-offset)
The Metadata Structure for Special Block
• Special block: when begin-msg-offset = end-msg-offset + 1
• Either no message for the table with offset less than begin-msg-offset
• Or any message for the table with offset less than begin-msg-offset has been
received and acknowledged by ClickHouse
• Example: replica_id,table1,30,29,20,table2,5,20,10
• All messages with offset less than 30 for table1 are acknowledged by
ClickHouse
Message Processing Sequence: Consume/Commit/Load
The message processing
shown here is per partition
Two Execution Modes:
• Aggregators starts from the message offset previously committed
• REPLAY: Where aggregator retries sending the last blocks sent for each table to avoid
data loss
• CONSUME: Where aggregator is done with REPLAY and it is in the normal state
• Mode Switching:
DetermineState (current_offset, saved_metadata) {
begin=saved_metadata.min
end = saved_metadata.max
if (current_offset > end) state = CONSUME
else state = REPLAY
}
The Top-Level Processing Loop of A Kafka Connector
• For each Kafka Connector:
while (running){ //outer loop
wait for ClickHouse and Kafka to be healthy and connected
while (running){ // inner loop
batch = read a batch from Kafka if error, break inner loop
for (msg : batch.messages){
partitionHandlers[msg.partition].consume(msg) if error, break
inner loop
}
for (ph : partitionHandlers){
if (ph.state == CONSUME){
ph.checkBuffers() if error, break the inner loop
}
}
}
disconnect from Kafka
clear partitionHandlers
}
Consume loop
Check buffers loop
- Commit to Kafka
- Flush to ClickHouse
- Append message to its
table’s buffer
Elapsed time <= max_poll_interval
Some Clarifications
• Partition handlers can be dynamically created or deleted due to Kafka Broker’s decision
• Under some failure condition, one Kafka Connector can have > 1 partitions assigned
• Partition handler performs metadata commit on the corresponding partition
• Each partition handler can process multiple tables (because a Kafka partition can support
multiple tables)
• At any given time, each partition handler can only have one in-flight block, per table, to
be inserted to ClickHouse
• No new block can be submitted until the current in-flight block gets successful ACK from ClickHouse
• Thus, the metadata committed is just one block per table ahead, i.e., “Write Ahead Logging with One
Block”
• In other words, when replay happens, at most one block per table needs to be replayed
Some Clarifications (cont’d)
• If block insertion to ClickHouse fails,
• The outermost loop will disconnect the Kafka Connector from the Kafka Broker
• The Kafka consumer group rebalancing gets triggered automatically
• A different replica’s Kafka Connector will be assigned for the partition and block insertion
continues at this new replica
• Thus, rebalancing allows “Global Retries with Last Committed State” over multiple replicas
• The same failure handling mechanism can be applied, for example, when metadata
commit to Kafka fails
• Thus, Kafka consumer group rebalancing is an indicator on the situation in which a failure
cannot be recovered by a block aggregator
Example on Partition Rebalancing on Replicas
The following diagram shows two aggregators in one shard being killed (to simulate 1
datacenter down), and block insertion traffic gets picked up by the two remaining
aggregators in the same shard.
The Outline of the Talk
• The block aggregator developed for multi-DC deployment
• The deterministic message replay protocol in block aggregator
• The runtime verifier as a monitoring/debugging tool for block aggregator
• Issues and experiences in block aggregator’s implementation and deployment
• The block aggregator deployment in production
Runtime Verification
•Aggregator Verifier (AV): To check all blocks flushed by all aggregators to
ClickHouse not cause any data loss/duplication
•How can AV know what are the blocks flushed by the aggregators?
• Each aggregator commits metadata to Kafka before flushing anything to ClickHouse, for each
partition
• All metadata records committed by the aggregators will be appended to an internal topic in
Kafka called __consumer_offsets
• Thus, AV needs to subscribe to this topic and learn about all blocks flushed to ClickHouse by all
aggregators
Runtime Verification Algorithm
Let M.t.start and M.t.end be the start offset
and end offset for table t in metadata M,
respectively
For any given metadata instances M and M’,
where M committed happened before M’
committed, in time:
•Backward Anomaly: For some table t,
M’.t.end < M.t.start
•Overlap Anomaly: For some table t,
M.t.start < M’.t.end AND M’.t.start <
M.t.end
Runtime Verifier Implementation
•The verifier reads metadata instances in the commit order to Kafka, stored in the system
topic called _consumer_offset.
•The _consumer_offset is a partitioned topic and Kafka does not guarantee ordering across
partitions.
•We order metadata instances with respect to their commit timestamp at the brokers. This
approach requires the clock of the Kafka brokers to be synchronized with an uncertainty
window less than the time between committing two metadata instances. Thus, we should
not commit metadata to Kafka too frequently.
•This is not a problem in block aggregator, as it commits metadata to Kafka for each block
every several seconds, which is not very frequent compared to the clock skew.
The Outline of the Talk
• The block aggregator developed for multi-DC deployment
• The deterministic message replay protocol in block aggregator
• The runtime verifier as a monitoring/debugging tool for block aggregator
• Issues and experiences in block aggregator’s implementation and deployment
• The block aggregator deployment in production
Compile and Link ClickHouse into Block Aggregator
• Instead of using the C++ client library at the ClickHouse repo, we compiled
and linked the entire ClickHouse codebase to block aggregator
• It allows us to leverage the native ClickHouse implementation:
• Native TCP/IP communication protocol (with TLS and connection pooling)
• Select query capabilities just like ClickHouse-Client (for testing purpose)
• Table schema retrieval, and block header construction from schema
• Column construction from protobuf-based Kafka message deserialization
• Column default expression evaluation
• ZooKeeper client for distributed locking
Dynamic Table Schema Update
• To dynamically update a table schema:
• Step 1: Table schema is updated to each ClickHouse shard
• Step 2: Block aggregators in each shard is restarted, thus to load updated schema from the
co-located ClickHouse replica
• Step 3: With offline confirmation on schema update, the client application updates its
application logic to follow the updated schema to produce new Kafka messages
• Requirement: Block aggregator needs to be able to deserialize the Kafka
messages into blocks, for the messages with or without the updated schema
• Solution: to enforce that columns in a table schema can only be added and
can not be deleted afterwards
Multiple ZooKeeper Clusters for One ClickHouse Cluster
• ClickHouse relies on ZooKeeper as metadata store and replication coordination
• Each block insertion takes roughly 15 remote calls to ZooKeeper server cluster
• Block insertion is performed per table
• Our ZooKeeper (with 3.5.8) cluster is deployed across three datacenters with ~ 20 ms cross-
datacenter communication latency
• For a large ClickHouse cluster with 250 shards (with each shard having 4 replicas), a single
ZooKeeper deployment can introduce high ZooKeeper “hardware exception” rate
• The exception due to ZooKeeper session frequently expired
• Multiple ZooKeeper clusters are deployed instead, with each allocated with a subset of the
ClickHouse shards
• In our deployment, 50 shards share one ZK cluster
• It depends on block insertion rate per table, and total number of tables involved in real-time
insertion
Distributed Locking at Block Aggregator
• Before “insert_quorum_parallel” is introduced in ClickHouse,
• In each shard, for each table, only one replica is allowed to perform data insertion
• Distributed locking is used to coordinate block insertion at block aggregators
• The ZooKeeper locking implementation in ClickHouse is used
• More recent ClickHouse version has “insert_quorum_parallel” introduced
• The default value is true
• According to the Altinity blog article, current ClickHouse implementation breaks
sequential consistency and may have other side effects
• In our recent product release based on ClickHouse 21.8, we turned this option off
• And we still enforce distributed locking at block aggregator
Testing on Block Aggregator
• Resiliency Testing (in an 8-shard cluster with 32 replicas )
• Follow the “Chaos Monkey” approach
• Kill: individual processes and individual containers, across ZooKeeper, ClickHouse, Block Aggregator
• Kill: all processes and containers in one datacenter, across ZooKeeper, ClickHouse, Block Aggregator
• To validate whether data loading can recover and continue
• Smaller-scale integration testing
• The whole cluster runs on a single machine with multiple processes from ZooKeeper, ClickHouse and
Block Aggregators
• Programmatically control process start/stop, along with small table insertion
• In addition, to turn on fault injection at predefined points in Block Aggregator code
- For example, to not accept Kafka messages deliberately for 10 seconds
• Validate whether data loss and data duplication happens
ClickHouse Troubleshooting and Remediation
• The setting “insert_quorum = 2” is to guarantee high data reliability
• ClickHouse Exception (with error code = 286) can happen occasionally:
2021.04.10 16:26:38.896509 [ 59963 ] {8421e4d6-43f0-4792-8570-7ef2bf8f595a} <Error> executeQuery: Code: 286, e.displayText()
= DB::Exception: Quorum for previous write has not been satisfied yet. Status: version: 1
part_name: 20210410-0_990_990_0
required_number_of_replicas: 2
actual_number_of_replicas: 1
replicas: SLC-74137
Data insertion in the whole shard stops
when this exception happens!
ClickHouse Troubleshooting and Remediation (cont’d)
• An inhouse tool is developed to:
• scan ZooKeeper subtree associated with log replication queues
• inspect why queued commands cannot be performed
• Once queued commands all get cleared, the quorum then automatically gets satisfied
• Afterwards, data insertion resumes in the shard
• Real-time alerts are defined:
• Long duration time that a shard does not have block insertion
• Block insertion experiences non-zero failure rate with error code = 286
• Some replicas have their replication queues too large
The Outline of the Talk
• The block aggregator developed for multi-DC deployment
• The deterministic message replay protocol in block aggregator
• The runtime verifier as a monitoring/debugging tool for block aggregator
• Issues and experiences in block aggregator’s implementation and deployment
• The block aggregator deployment in production
Block Aggregator Deployment in Production
One Example Deployment
Kafka Clusters: 2 Datacenters
The ClickHouse Cluster:
*2 datacenters
*250 shards
*Each shard having 4 replicas (2 replica
per DC)
*Each aggregator co-located in each
replica
Metric Measured Result
Total messages processed/sec (peak) 280 K
Total message bytes processed/sec (peak) 220 MB/sec
95%-tile block insertion time (quorum=2) 3.8 sec (for table 1)
1.1 sec (for table 2)
4.0 sec (for table 3)
95%-tile block size 0.16 MB (for table 1)
0.03 MB (for table 2)
0.46 MB (for table 3)
95%-tile number of rows in a block 1358 rows (for table 1)
1.8 rows (for table 2)
1894 rows (for table 3)
95%-tile Kafka commit time 64 ms
End-to-end message consumption Lag time < 30 sec
Block Aggregator Deployment in Production
•The block insertion rate at the shard level in a 24-hour window
Block Aggregator Deployment in Production
•The message consumption LAG time at the shard level captured in a 24-hour window
Block Aggregator Deployment in Production
•The Kafka Group Rebalance Rate at the shard level in a 24-hour window (always 0)
Block Aggregator Deployment in Production
•The ZooKeeper hardware exception in a 24-hour window (close to 0)
Summary
•Using streaming platforms like Kafka is one standard way to transfer data across data
processing systems
•For Columnar DB, block loading is more efficient than loading individual records
•Under failure conditions, replaying Kafka messages may cause data loss or data duplication at
block loaders
•Our solution is to deterministically produce identical blocks under various failure conditions so
that the backend Columnar DB can detect and remove duplicated blocks
•The same solution allows us to verify that blocks are always produced correctly under failure
conditions
•This solution has been developed and deployed into production

More Related Content

What's hot

Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UIData Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UIAltinity Ltd
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAltinity Ltd
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides Altinity Ltd
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howAltinity Ltd
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaValery Tkachenko
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Ltd
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevAltinity Ltd
 
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...HostedbyConfluent
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouseAltinity Ltd
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOAltinity Ltd
 
MySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & OperationsMySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & OperationsFrederic Descamps
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouseVianney FOUCAULT
 
How to tune Kafka® for production
How to tune Kafka® for productionHow to tune Kafka® for production
How to tune Kafka® for productionconfluent
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyAlexander Kukushkin
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd
 
ProxySQL High Avalability and Configuration Management Overview
ProxySQL High Avalability and Configuration Management OverviewProxySQL High Avalability and Configuration Management Overview
ProxySQL High Avalability and Configuration Management OverviewRené Cannaò
 

What's hot (20)

Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UIData Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and how
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek Vavrusa
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
 
MySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & OperationsMySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & Operations
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse
 
How to tune Kafka® for production
How to tune Kafka® for productionHow to tune Kafka® for production
How to tune Kafka® for production
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
ProxySQL High Avalability and Configuration Management Overview
ProxySQL High Avalability and Configuration Management OverviewProxySQL High Avalability and Configuration Management Overview
ProxySQL High Avalability and Configuration Management Overview
 

Similar to Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay

Swift container sync
Swift container syncSwift container sync
Swift container syncOpen Stack
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scalejimriecken
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to heroAvi Levi
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka TLV
 
Strict-Data-Consistency-in-Distrbuted-Systems-With-Failures
Strict-Data-Consistency-in-Distrbuted-Systems-With-FailuresStrict-Data-Consistency-in-Distrbuted-Systems-With-Failures
Strict-Data-Consistency-in-Distrbuted-Systems-With-FailuresSlava Imeshev
 
Deep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionDeep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionAlexandre Tamborrino
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Gwen (Chen) Shapira
 
Exactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J SaxExactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J SaxHostedbyConfluent
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarKarthik Ramasamy
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of stateYoni Farin
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland HochmuthOSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland HochmuthNETWAYS
 
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland HochmuthOSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland HochmuthNETWAYS
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafkaconfluent
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPCMax Alexejev
 

Similar to Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay (20)

Swift container sync
Swift container syncSwift container sync
Swift container sync
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to hero
 
Strict-Data-Consistency-in-Distrbuted-Systems-With-Failures
Strict-Data-Consistency-in-Distrbuted-Systems-With-FailuresStrict-Data-Consistency-in-Distrbuted-Systems-With-Failures
Strict-Data-Consistency-in-Distrbuted-Systems-With-Failures
 
Deep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionDeep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumption
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Exactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J SaxExactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J Sax
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Kafka101
Kafka101Kafka101
Kafka101
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland HochmuthOSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
 
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland HochmuthOSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPC
 

More from Altinity Ltd

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxAltinity Ltd
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Altinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceAltinity Ltd
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfAltinity Ltd
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfAltinity Ltd
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Altinity Ltd
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Altinity Ltd
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfAltinity Ltd
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsAltinity Ltd
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache PinotAltinity Ltd
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Ltd
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...Altinity Ltd
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfAltinity Ltd
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...Altinity Ltd
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...Altinity Ltd
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...Altinity Ltd
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...Altinity Ltd
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...Altinity Ltd
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfAltinity Ltd
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...Altinity Ltd
 

More from Altinity Ltd (20)

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
 

Recently uploaded

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 

Recently uploaded (20)

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 

Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay

  • 1. Real-Time, Exactly-Once Data Ingestion from Kafka to ClickHouse Mohammad Roohitavaf, Jun Li October 21, 2021
  • 2. The Real-Time Analytics Processing Pipeline
  • 3. ClickHouse as Real-Time Analytics Database • ClickHouse: an open-source columnar database to support OLAP • Data insertion favors large blocks over individual rows • Kafka serves as data buffering • A Block Aggregator is a data loader to aggregate Kafka messages into large blocks before loading to ClickHouse
  • 4. Block Aggregator Failures • With respect to block aggregator • Kafka can fail • Database backend can fail • Network connections to Kafka and database can fail • Block aggregator itself can crash • Blindly retries on loading data will lead to data loss or data duplication to data persisted in database • Kafka transaction mechanism can not be applied here
  • 5. Our Solution: Exactly-Once Message Delivery to ClickHouse • To have aggregator to deterministically produce identical blocks to ClickHouse • With existing runtime supports: • Kafka metadata store to keep track of execution state, and • ClickHouse’s block duplication detection
  • 6. The Outline of the Talk • The block aggregator developed for multi-DC deployment • The deterministic message replay protocol in block aggregator • The runtime verifier as a monitoring/debugging tool for block aggregator • Issues and experiences in block aggregator’s implementation and deployment • The block aggregator deployment in production
  • 7. The Multi-DC Kafka/ClickHouse Deployment • Each database shard has its own topic • #partitions in topic = #replicas in shard • Block aggregator co-located in each replica (as two containers in a Kubernetes pod) • Block aggregator only inserts data to local database replica (with ClickHouse replication protocol to replicate data to other replicas) • Each block aggregator subscribes to both Kafka clusters
  • 8. The Multi-DC Kafka/ClickHouse Failure Scenario (1) (Kafka DC Down)
  • 9. The Multi-DC Kafka/ClickHouse Failure Scenario (2) (DC Down) (ClickHouse DC Down) • ClickHouse insert-quorum = 2
  • 10. The Multi-DC Kafka/ClickHouse Failure Scenario (3) (Kafka DC Down) (ClickHouse DC Down) • ClickHouse insert-quorum = 2
  • 11. Mappings of Topics, Tables, Rows, Messages • One topic contains messages associated with multiple tables in database • One message contains multiple rows belonging to the same table • Each message is an opaque byte-array in Kafka based on the protobuf-based encoding mechanism • Block aggregator relies on ClickHouse table schema to decode Kafka messages • When a new table is added to database, no need to make schema changes to Kafka clusters • The number of topics does not grow as the tables continue to be added • Table rows constructed from Kafka messages in two Kafka DCs get merged in database
  • 12. The Block Aggregator Architecture
  • 13. The Key Features of Block Aggregator • Support multi-datacenter deployment model • Multiple tables per topic/partition • No data loss/duplication • Monitoring with over hundred metrics: • Message processing rates • Block insertion rate and failure rate • Block size distribution • Block loading time distribution • Kafka metadata commit time and failure rate • Whether abnormal message consumption behaviors happened (such as message offset re-wound or skipped)
  • 14. The Outline of the Talk • The block aggregator developed for multi-DC deployment • The deterministic message replay protocol in block aggregator • The runtime verifier as a monitoring/debugging tool for block aggregator • Issues and experiences in block aggregator’s implementation and deployment • The block aggregator deployment in production
  • 15. A Naïve Way for Block Aggregator to Replay Messages (1)
  • 16. A Naïve Way for Block Aggregator to Replay Messages (2)
  • 17. Our Solution: Block-Level Deduplication in ClickHouse (1) • ClickHouse relies on ZooKeeper to store metadata • Each block stored contains a hash value • New blocks to be inserted need to have hash uniqueness checked • Blocks are identical if • Having same block size • Containing same rows • And rows in same order
  • 18. Our Solution: Guarantee to Form Identical Blocks (2) • Store metadata back to Kafka which describes the latest blocks formed for each table • In case of failure, the next Block Aggregator that picks up the partition will know exactly how to reconstruct the latest blocks formed for each table by the previous Block Aggregator • The Block Aggregators can be in two different ClickHouse replicas, if Kafka partition rebalancing happens
  • 19. The Metadata Structure For each Kafka connector, the metadata persisted to Kafka, per partition, is: replica_1,table1,0,29,20,table2,5,20,10 The last block for table1 decided to load to ClickHouse: [0, 29]. Starting offset min = 0, we have consumed 20 messages for table1. The last block for table2 decided to load to ClickHouse: [5, 20]. Starting offset min = 0, we have consumed 10 messages for table2. In total, we have consumed all 30 messages from offset min=0 to offset max=29: 20 for table 1 and 10 for table2. replica-Id, [table-name, begin-msg-offset, end-msg-offset, count]+ Metadata.min = MIN (begin-msg-offset); Metadata.max = MAX(end-msg-offset)
  • 20. The Metadata Structure for Special Block • Special block: when begin-msg-offset = end-msg-offset + 1 • Either no message for the table with offset less than begin-msg-offset • Or any message for the table with offset less than begin-msg-offset has been received and acknowledged by ClickHouse • Example: replica_id,table1,30,29,20,table2,5,20,10 • All messages with offset less than 30 for table1 are acknowledged by ClickHouse
  • 21. Message Processing Sequence: Consume/Commit/Load The message processing shown here is per partition
  • 22. Two Execution Modes: • Aggregators starts from the message offset previously committed • REPLAY: Where aggregator retries sending the last blocks sent for each table to avoid data loss • CONSUME: Where aggregator is done with REPLAY and it is in the normal state • Mode Switching: DetermineState (current_offset, saved_metadata) { begin=saved_metadata.min end = saved_metadata.max if (current_offset > end) state = CONSUME else state = REPLAY }
  • 23. The Top-Level Processing Loop of A Kafka Connector • For each Kafka Connector: while (running){ //outer loop wait for ClickHouse and Kafka to be healthy and connected while (running){ // inner loop batch = read a batch from Kafka if error, break inner loop for (msg : batch.messages){ partitionHandlers[msg.partition].consume(msg) if error, break inner loop } for (ph : partitionHandlers){ if (ph.state == CONSUME){ ph.checkBuffers() if error, break the inner loop } } } disconnect from Kafka clear partitionHandlers } Consume loop Check buffers loop - Commit to Kafka - Flush to ClickHouse - Append message to its table’s buffer Elapsed time <= max_poll_interval
  • 24. Some Clarifications • Partition handlers can be dynamically created or deleted due to Kafka Broker’s decision • Under some failure condition, one Kafka Connector can have > 1 partitions assigned • Partition handler performs metadata commit on the corresponding partition • Each partition handler can process multiple tables (because a Kafka partition can support multiple tables) • At any given time, each partition handler can only have one in-flight block, per table, to be inserted to ClickHouse • No new block can be submitted until the current in-flight block gets successful ACK from ClickHouse • Thus, the metadata committed is just one block per table ahead, i.e., “Write Ahead Logging with One Block” • In other words, when replay happens, at most one block per table needs to be replayed
  • 25. Some Clarifications (cont’d) • If block insertion to ClickHouse fails, • The outermost loop will disconnect the Kafka Connector from the Kafka Broker • The Kafka consumer group rebalancing gets triggered automatically • A different replica’s Kafka Connector will be assigned for the partition and block insertion continues at this new replica • Thus, rebalancing allows “Global Retries with Last Committed State” over multiple replicas • The same failure handling mechanism can be applied, for example, when metadata commit to Kafka fails • Thus, Kafka consumer group rebalancing is an indicator on the situation in which a failure cannot be recovered by a block aggregator
  • 26. Example on Partition Rebalancing on Replicas The following diagram shows two aggregators in one shard being killed (to simulate 1 datacenter down), and block insertion traffic gets picked up by the two remaining aggregators in the same shard.
  • 27. The Outline of the Talk • The block aggregator developed for multi-DC deployment • The deterministic message replay protocol in block aggregator • The runtime verifier as a monitoring/debugging tool for block aggregator • Issues and experiences in block aggregator’s implementation and deployment • The block aggregator deployment in production
  • 28. Runtime Verification •Aggregator Verifier (AV): To check all blocks flushed by all aggregators to ClickHouse not cause any data loss/duplication •How can AV know what are the blocks flushed by the aggregators? • Each aggregator commits metadata to Kafka before flushing anything to ClickHouse, for each partition • All metadata records committed by the aggregators will be appended to an internal topic in Kafka called __consumer_offsets • Thus, AV needs to subscribe to this topic and learn about all blocks flushed to ClickHouse by all aggregators
  • 29. Runtime Verification Algorithm Let M.t.start and M.t.end be the start offset and end offset for table t in metadata M, respectively For any given metadata instances M and M’, where M committed happened before M’ committed, in time: •Backward Anomaly: For some table t, M’.t.end < M.t.start •Overlap Anomaly: For some table t, M.t.start < M’.t.end AND M’.t.start < M.t.end
  • 30. Runtime Verifier Implementation •The verifier reads metadata instances in the commit order to Kafka, stored in the system topic called _consumer_offset. •The _consumer_offset is a partitioned topic and Kafka does not guarantee ordering across partitions. •We order metadata instances with respect to their commit timestamp at the brokers. This approach requires the clock of the Kafka brokers to be synchronized with an uncertainty window less than the time between committing two metadata instances. Thus, we should not commit metadata to Kafka too frequently. •This is not a problem in block aggregator, as it commits metadata to Kafka for each block every several seconds, which is not very frequent compared to the clock skew.
  • 31. The Outline of the Talk • The block aggregator developed for multi-DC deployment • The deterministic message replay protocol in block aggregator • The runtime verifier as a monitoring/debugging tool for block aggregator • Issues and experiences in block aggregator’s implementation and deployment • The block aggregator deployment in production
  • 32. Compile and Link ClickHouse into Block Aggregator • Instead of using the C++ client library at the ClickHouse repo, we compiled and linked the entire ClickHouse codebase to block aggregator • It allows us to leverage the native ClickHouse implementation: • Native TCP/IP communication protocol (with TLS and connection pooling) • Select query capabilities just like ClickHouse-Client (for testing purpose) • Table schema retrieval, and block header construction from schema • Column construction from protobuf-based Kafka message deserialization • Column default expression evaluation • ZooKeeper client for distributed locking
  • 33. Dynamic Table Schema Update • To dynamically update a table schema: • Step 1: Table schema is updated to each ClickHouse shard • Step 2: Block aggregators in each shard is restarted, thus to load updated schema from the co-located ClickHouse replica • Step 3: With offline confirmation on schema update, the client application updates its application logic to follow the updated schema to produce new Kafka messages • Requirement: Block aggregator needs to be able to deserialize the Kafka messages into blocks, for the messages with or without the updated schema • Solution: to enforce that columns in a table schema can only be added and can not be deleted afterwards
  • 34. Multiple ZooKeeper Clusters for One ClickHouse Cluster • ClickHouse relies on ZooKeeper as metadata store and replication coordination • Each block insertion takes roughly 15 remote calls to ZooKeeper server cluster • Block insertion is performed per table • Our ZooKeeper (with 3.5.8) cluster is deployed across three datacenters with ~ 20 ms cross- datacenter communication latency • For a large ClickHouse cluster with 250 shards (with each shard having 4 replicas), a single ZooKeeper deployment can introduce high ZooKeeper “hardware exception” rate • The exception due to ZooKeeper session frequently expired • Multiple ZooKeeper clusters are deployed instead, with each allocated with a subset of the ClickHouse shards • In our deployment, 50 shards share one ZK cluster • It depends on block insertion rate per table, and total number of tables involved in real-time insertion
  • 35. Distributed Locking at Block Aggregator • Before “insert_quorum_parallel” is introduced in ClickHouse, • In each shard, for each table, only one replica is allowed to perform data insertion • Distributed locking is used to coordinate block insertion at block aggregators • The ZooKeeper locking implementation in ClickHouse is used • More recent ClickHouse version has “insert_quorum_parallel” introduced • The default value is true • According to the Altinity blog article, current ClickHouse implementation breaks sequential consistency and may have other side effects • In our recent product release based on ClickHouse 21.8, we turned this option off • And we still enforce distributed locking at block aggregator
  • 36. Testing on Block Aggregator • Resiliency Testing (in an 8-shard cluster with 32 replicas ) • Follow the “Chaos Monkey” approach • Kill: individual processes and individual containers, across ZooKeeper, ClickHouse, Block Aggregator • Kill: all processes and containers in one datacenter, across ZooKeeper, ClickHouse, Block Aggregator • To validate whether data loading can recover and continue • Smaller-scale integration testing • The whole cluster runs on a single machine with multiple processes from ZooKeeper, ClickHouse and Block Aggregators • Programmatically control process start/stop, along with small table insertion • In addition, to turn on fault injection at predefined points in Block Aggregator code - For example, to not accept Kafka messages deliberately for 10 seconds • Validate whether data loss and data duplication happens
  • 37. ClickHouse Troubleshooting and Remediation • The setting “insert_quorum = 2” is to guarantee high data reliability • ClickHouse Exception (with error code = 286) can happen occasionally: 2021.04.10 16:26:38.896509 [ 59963 ] {8421e4d6-43f0-4792-8570-7ef2bf8f595a} <Error> executeQuery: Code: 286, e.displayText() = DB::Exception: Quorum for previous write has not been satisfied yet. Status: version: 1 part_name: 20210410-0_990_990_0 required_number_of_replicas: 2 actual_number_of_replicas: 1 replicas: SLC-74137 Data insertion in the whole shard stops when this exception happens!
  • 38. ClickHouse Troubleshooting and Remediation (cont’d) • An inhouse tool is developed to: • scan ZooKeeper subtree associated with log replication queues • inspect why queued commands cannot be performed • Once queued commands all get cleared, the quorum then automatically gets satisfied • Afterwards, data insertion resumes in the shard • Real-time alerts are defined: • Long duration time that a shard does not have block insertion • Block insertion experiences non-zero failure rate with error code = 286 • Some replicas have their replication queues too large
  • 39. The Outline of the Talk • The block aggregator developed for multi-DC deployment • The deterministic message replay protocol in block aggregator • The runtime verifier as a monitoring/debugging tool for block aggregator • Issues and experiences in block aggregator’s implementation and deployment • The block aggregator deployment in production
  • 40. Block Aggregator Deployment in Production One Example Deployment Kafka Clusters: 2 Datacenters The ClickHouse Cluster: *2 datacenters *250 shards *Each shard having 4 replicas (2 replica per DC) *Each aggregator co-located in each replica Metric Measured Result Total messages processed/sec (peak) 280 K Total message bytes processed/sec (peak) 220 MB/sec 95%-tile block insertion time (quorum=2) 3.8 sec (for table 1) 1.1 sec (for table 2) 4.0 sec (for table 3) 95%-tile block size 0.16 MB (for table 1) 0.03 MB (for table 2) 0.46 MB (for table 3) 95%-tile number of rows in a block 1358 rows (for table 1) 1.8 rows (for table 2) 1894 rows (for table 3) 95%-tile Kafka commit time 64 ms End-to-end message consumption Lag time < 30 sec
  • 41. Block Aggregator Deployment in Production •The block insertion rate at the shard level in a 24-hour window
  • 42. Block Aggregator Deployment in Production •The message consumption LAG time at the shard level captured in a 24-hour window
  • 43. Block Aggregator Deployment in Production •The Kafka Group Rebalance Rate at the shard level in a 24-hour window (always 0)
  • 44. Block Aggregator Deployment in Production •The ZooKeeper hardware exception in a 24-hour window (close to 0)
  • 45. Summary •Using streaming platforms like Kafka is one standard way to transfer data across data processing systems •For Columnar DB, block loading is more efficient than loading individual records •Under failure conditions, replaying Kafka messages may cause data loss or data duplication at block loaders •Our solution is to deterministically produce identical blocks under various failure conditions so that the backend Columnar DB can detect and remove duplicated blocks •The same solution allows us to verify that blocks are always produced correctly under failure conditions •This solution has been developed and deployed into production