SlideShare a Scribd company logo
1 of 30
Download to read offline
© 2020, Altinity LTD© 2019, Altinity LTD
© 2020, Altinity LTD
Introductions
www.altinity.com
Software and services
provider for ClickHouse
Major committer and
community sponsor in US
and Western Europe
Robert Hodges (CEO)
>30 years DBMS plus
virtualization & security
Mikhail Filimonov (Engineer)
Kafka Engine maintainer and
ClickHouse committer
© 2020, Altinity LTD
What’s Kafka?
(And why use it
with ClickHouse)
© 2020, Altinity LTD
Kafka Broker
Kafka is messaging on steroids
Topic: Readings
Partitions
Producer
Producer
Consumer
Consumer
Consumer Group
Replicas
© 2020, Altinity LTD
ClickHouse is not a slouch either
Understands SQL
Runs on bare metal to cloud
Shared nothing architecture
Uses column storage
Parallel and vectorized execution
Scales to many petabytes
Is Open source (Apache 2.0)
a b c d
a b c d
a b c d
a b c d
And it’s really fast!
© 2020, Altinity LTD
Reasons to use Kafka with ClickHouse
Kafka
Apps
ClickHouse
AppsYour Apps
Many
datasources
High throughput
Low latency
Message
replay
© 2020, Altinity LTD
Reading data
from Kafka
© 2020, Altinity LTD
Standard flow from Kafka to ClickHouse
Topic
Contains
messages
Kafka Table Engine
Encapsulates topic
within ClickHouse
Materialized View
Fetches Rows
MergeTree Table
Stores Rows
© 2020, Altinity LTD
Create inbound Kafka topic
kafka-topics 
--bootstrap-server kafka-headless:9092 
--topic readings 
--create --partitions 6 
--replication-factor 3
© 2020, Altinity LTD
Create target table
CREATE TABLE readings (
readings_id Int32 Codec(DoubleDelta, LZ4),
time DateTime Codec(DoubleDelta, LZ4),
date ALIAS toDate(time),
temperature Decimal(5,2) Codec(T64, LZ4)
) Engine = MergeTree
PARTITION BY toYYYYMM(time)
© 2020, Altinity LTD
Create Kafka Engine table
CREATE TABLE readings_queue (
readings_id Int32,
time DateTime,
temperature Decimal(5,2)
) ENGINE = Kafka SETTINGS
kafka_broker_list = 'kafka-headless.kafka:9092',
kafka_topic_list = 'readings',
kafka_group_name = 'readings_consumer_group1',
kafka_num_consumers = '1',
kafka_format = 'CSV'
© 2020, Altinity LTD
Create materialized view to transfer data
CREATE MATERIALIZED VIEW readings_queue_mv
TO readings
AS
SELECT readings_id, time, temperature
FROM readings_queue;
© 2020, Altinity LTD
Writing data to
Kafka
© 2020, Altinity LTD
Standard flow from ClickHouse to Kafka
Topic
Contains
messages
Kafka Table Engine
Encapsulates topic
within ClickHouse
INSERT
© 2020, Altinity LTD
Create outbound Kafka topic
kafka-topics 
--bootstrap-server kafka-headless:9092 
--topic events 
--create --partitions 6 
--replication-factor 3
© 2020, Altinity LTD
Create Kafka Engine table
CREATE TABLE events (
time DateTime,
severity String,
content String
) ENGINE = Kafka SETTINGS
kafka_broker_list = kafka-headless.kafka:9092',
kafka_topic_list = 'events',
kafka_group_name = 'events_consumer_group1',
kafka_format = 'CSV'
© 2020, Altinity LTD
Insert data to write into Kafka
-- (In clickhouse-client)
INSERT INTO events VALUES
(now(), 'ERROR', 'Oh no!')
-- (In another window)
kafka-console-consumer --bootstrap-server 
kafka-headless:9092 --topic events
{"time":"2020-01-19 05:07:10",
"severity":"ERROR","content":"Oh no!"}
© 2020, Altinity LTD
Kafka Tips and
Tricks
© 2020, Altinity LTD
Kafka table engine internals
ClickHouse Server
Kafka Table Engine
readings_queue
librdkafka
Kafka Broker
Topic readings
Settings
kafka_broker_list
kafka_topic_list
...
kafka_num_consumers = 1 Config.xml
<!-- Global config -->
<kafka>
<debug>cgrp</debug>
...
</kafka>
<!-- Topic config -->
<kafka_readings>
<retry_backoff_ms>250</retry_backoff_ms>
</kafka_readings>
© 2020, Altinity LTD
Overall best practices
● Use ClickHouse version 19.16.10 or newer
● For HA you should have at least min.insync.replicas+1 brokers.
○ Typical scenario: 3 brokers, replication factor = 3, min.insync.replicas = 2
● To consume your topic in parallel you need to have enough partitions (you
can’t have more consumers than partitions, otherwise some of them will do
nothing). You can try for example 2*num_of_consumers
● If you need to get ‘coordinates’ of consumed messages use virtual columns:
○ _topic, _partition, _timestamp, _key, _offset
○ Just use the in MV, w/o declaring in Engine=Kafka table
© 2020, Altinity LTD
Overall best practices
● When you have many Kafka tables - increase background_schedule_pool_size
(monitor BackgroundSchedulePoolTask)
● If consuming performance is too low - don’t use num_consumers (keep it 1),
but create a separate table with Engine=Kafka and MV streaming data to the
same target.
● To set rdkafka options - add to <kafka> section in config.xml or preferably use
a separate file in config.d/
○ https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md
© 2020, Altinity LTD
ClickHouse Clusters and Kafka
● Best practice - every ClickHouse server consumes some partitions, and
flushes rows to local ReplicatedMergeTree table.
● Flush to Distributed table is also possible
○ If you need to shard the data in ClickHouse according to some sharding key
● Chains of materialized view are possible but can be less reliable
○ inserts are not atomic, so on failure you can get ‘dirty’ state
○ Atomic MV chains are planned for the first half of 2020
© 2020, Altinity LTD
Rewind / fast-forward / replay
● Step 1: Detach kafka tables in clickhouse
● Step 2: kafka-consumer-groups.sh --bootstrap-server kafka:9092 --topic
topic:0,1,2 --group id1 --reset-offsets --to-latest --execute
○ More samples: https://gist.github.com/filimonov/1646259d18b911d7a1e8745d6411c0cc
● Step: Attach kafka tables back
See also configuration settings:
<kafka>
<auto_offset_reset>smallest</auto_offset_reset>
</kafka>
© 2020, Altinity LTD
How batching from Kafka stream works
Important settings: kafka_max_block_size, stream_poll_timeout_ms,
stream_flush_interval_ms
1. Batch poll (time limit: stream_poll_timeout_ms 500ms, messages limit:
kafka_max_block_size 65536)
2. Parse messages. If we have enough data (rows limit: kafka_max_block_size
65536) or reach time limit (stream_flush_interval_ms 7500ms) - flush it to
target MV, if no - repeat step 1.
3. Commit happen after writing data to MV (commit after write = at-least-once)
4. On any error during that process kafka client is restarted (leading to rebalance
- leave the group and get back in few seconds)
© 2020, Altinity LTD
Alternatives to
the ClickHouse
Kafka Engine
© 2020, Altinity LTD
Loading data via a client application
Kafka ClickHouse
Java
Connector
Home-built
client
© 2020, Altinity LTD
Other approaches to consider
● If you like the Java Stack & use something from that stack already - you can
stream Kafka topic to ClickHouse JDBC
○ Apache NiFi
○ Apache Storm
○ Kafka Streams
● A new entrant, not tested: https://github.com/housepower/clickhouse_sinker
© 2020, Altinity LTD
Kafka Feature
Roadmap and
Wrap-up
© 2020, Altinity LTD
Roadmap
● 2020 near-term Kafka improvements
○ Eliminate duplicates due to topic rebalancing
○ Filling key for inserts (to allow partitioning), also timestamps
○ Better error processing
○ Exactly once semantics
○ AVRO format
○ Introspection - system.kafka, metrics & events
● Long-term Kafka work
○ Fix performance issues including efficient consumer support
○ Support for other messaging systems (need to decide which ones)
○ Give us your thoughts!
File issues on Github or contact Altinity directly if you have feature requests
© 2020, Altinity LTD
Thank you!
Special Offer:
Contact us for a 1-hour
consultation
Presenters:
rhodges@altinity.com
mfilimonov@altinity.com
Visit us at:
https://www.altinity.com
Free Consultation:
https://blog.altinity.com/offer

More Related Content

What's hot

What's hot (20)

Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
 
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster PerformanceWebinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and how
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
Creating Beautiful Dashboards with Grafana and ClickHouse
Creating Beautiful Dashboards with Grafana and ClickHouseCreating Beautiful Dashboards with Grafana and ClickHouse
Creating Beautiful Dashboards with Grafana and ClickHouse
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse
 
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 

Similar to Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka

Extending OpenStack for Fun and Profit
Extending OpenStack for Fun and ProfitExtending OpenStack for Fun and Profit
Extending OpenStack for Fun and Profit
tsmith416
 
Extending OpenStack for Fun and Profit.pptx
Extending OpenStack for Fun and Profit.pptxExtending OpenStack for Fun and Profit.pptx
Extending OpenStack for Fun and Profit.pptx
OpenStack Foundation
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareEvent Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
HostedbyConfluent
 

Similar to Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka (20)

Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
 
Apache Kafka - Strakin Technologies Pvt Ltd
Apache Kafka - Strakin Technologies Pvt LtdApache Kafka - Strakin Technologies Pvt Ltd
Apache Kafka - Strakin Technologies Pvt Ltd
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC Solution
 
Building Stateful applications on Streaming Platforms | Premjit Mishra, Dell ...
Building Stateful applications on Streaming Platforms | Premjit Mishra, Dell ...Building Stateful applications on Streaming Platforms | Premjit Mishra, Dell ...
Building Stateful applications on Streaming Platforms | Premjit Mishra, Dell ...
 
Extending OpenStack for Fun and Profit
Extending OpenStack for Fun and ProfitExtending OpenStack for Fun and Profit
Extending OpenStack for Fun and Profit
 
Extending OpenStack for Fun and Profit.pptx
Extending OpenStack for Fun and Profit.pptxExtending OpenStack for Fun and Profit.pptx
Extending OpenStack for Fun and Profit.pptx
 
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
 
Building Your Data Streams for all the IoT
Building Your Data Streams for all the IoTBuilding Your Data Streams for all the IoT
Building Your Data Streams for all the IoT
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
GraphQL across the stack: How everything fits together
GraphQL across the stack: How everything fits togetherGraphQL across the stack: How everything fits together
GraphQL across the stack: How everything fits together
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
OSMC 2022 | Ignite: Observability with Grafana & Prometheus for Kafka on Kube...
OSMC 2022 | Ignite: Observability with Grafana & Prometheus for Kafka on Kube...OSMC 2022 | Ignite: Observability with Grafana & Prometheus for Kafka on Kube...
OSMC 2022 | Ignite: Observability with Grafana & Prometheus for Kafka on Kube...
 
Serverless Data Architecture at scale on Google Cloud Platform
Serverless Data Architecture at scale on Google Cloud PlatformServerless Data Architecture at scale on Google Cloud Platform
Serverless Data Architecture at scale on Google Cloud Platform
 
Exploring sql server 2016 bi
Exploring sql server 2016 biExploring sql server 2016 bi
Exploring sql server 2016 bi
 
Spring Boot & Spring Cloud on k8s and PCF
Spring Boot & Spring Cloud on k8s and PCFSpring Boot & Spring Cloud on k8s and PCF
Spring Boot & Spring Cloud on k8s and PCF
 
Introduction to Vitess on Kubernetes for MySQL - Webinar
Introduction to Vitess on Kubernetes for MySQL -  WebinarIntroduction to Vitess on Kubernetes for MySQL -  Webinar
Introduction to Vitess on Kubernetes for MySQL - Webinar
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareEvent Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
 
Elastically Scaling Kafka Using Confluent
Elastically Scaling Kafka Using ConfluentElastically Scaling Kafka Using Confluent
Elastically Scaling Kafka Using Confluent
 

More from Altinity Ltd

More from Altinity Ltd (20)

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
 
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka

  • 1. © 2020, Altinity LTD© 2019, Altinity LTD
  • 2. © 2020, Altinity LTD Introductions www.altinity.com Software and services provider for ClickHouse Major committer and community sponsor in US and Western Europe Robert Hodges (CEO) >30 years DBMS plus virtualization & security Mikhail Filimonov (Engineer) Kafka Engine maintainer and ClickHouse committer
  • 3. © 2020, Altinity LTD What’s Kafka? (And why use it with ClickHouse)
  • 4. © 2020, Altinity LTD Kafka Broker Kafka is messaging on steroids Topic: Readings Partitions Producer Producer Consumer Consumer Consumer Group Replicas
  • 5. © 2020, Altinity LTD ClickHouse is not a slouch either Understands SQL Runs on bare metal to cloud Shared nothing architecture Uses column storage Parallel and vectorized execution Scales to many petabytes Is Open source (Apache 2.0) a b c d a b c d a b c d a b c d And it’s really fast!
  • 6. © 2020, Altinity LTD Reasons to use Kafka with ClickHouse Kafka Apps ClickHouse AppsYour Apps Many datasources High throughput Low latency Message replay
  • 7. © 2020, Altinity LTD Reading data from Kafka
  • 8. © 2020, Altinity LTD Standard flow from Kafka to ClickHouse Topic Contains messages Kafka Table Engine Encapsulates topic within ClickHouse Materialized View Fetches Rows MergeTree Table Stores Rows
  • 9. © 2020, Altinity LTD Create inbound Kafka topic kafka-topics --bootstrap-server kafka-headless:9092 --topic readings --create --partitions 6 --replication-factor 3
  • 10. © 2020, Altinity LTD Create target table CREATE TABLE readings ( readings_id Int32 Codec(DoubleDelta, LZ4), time DateTime Codec(DoubleDelta, LZ4), date ALIAS toDate(time), temperature Decimal(5,2) Codec(T64, LZ4) ) Engine = MergeTree PARTITION BY toYYYYMM(time)
  • 11. © 2020, Altinity LTD Create Kafka Engine table CREATE TABLE readings_queue ( readings_id Int32, time DateTime, temperature Decimal(5,2) ) ENGINE = Kafka SETTINGS kafka_broker_list = 'kafka-headless.kafka:9092', kafka_topic_list = 'readings', kafka_group_name = 'readings_consumer_group1', kafka_num_consumers = '1', kafka_format = 'CSV'
  • 12. © 2020, Altinity LTD Create materialized view to transfer data CREATE MATERIALIZED VIEW readings_queue_mv TO readings AS SELECT readings_id, time, temperature FROM readings_queue;
  • 13. © 2020, Altinity LTD Writing data to Kafka
  • 14. © 2020, Altinity LTD Standard flow from ClickHouse to Kafka Topic Contains messages Kafka Table Engine Encapsulates topic within ClickHouse INSERT
  • 15. © 2020, Altinity LTD Create outbound Kafka topic kafka-topics --bootstrap-server kafka-headless:9092 --topic events --create --partitions 6 --replication-factor 3
  • 16. © 2020, Altinity LTD Create Kafka Engine table CREATE TABLE events ( time DateTime, severity String, content String ) ENGINE = Kafka SETTINGS kafka_broker_list = kafka-headless.kafka:9092', kafka_topic_list = 'events', kafka_group_name = 'events_consumer_group1', kafka_format = 'CSV'
  • 17. © 2020, Altinity LTD Insert data to write into Kafka -- (In clickhouse-client) INSERT INTO events VALUES (now(), 'ERROR', 'Oh no!') -- (In another window) kafka-console-consumer --bootstrap-server kafka-headless:9092 --topic events {"time":"2020-01-19 05:07:10", "severity":"ERROR","content":"Oh no!"}
  • 18. © 2020, Altinity LTD Kafka Tips and Tricks
  • 19. © 2020, Altinity LTD Kafka table engine internals ClickHouse Server Kafka Table Engine readings_queue librdkafka Kafka Broker Topic readings Settings kafka_broker_list kafka_topic_list ... kafka_num_consumers = 1 Config.xml <!-- Global config --> <kafka> <debug>cgrp</debug> ... </kafka> <!-- Topic config --> <kafka_readings> <retry_backoff_ms>250</retry_backoff_ms> </kafka_readings>
  • 20. © 2020, Altinity LTD Overall best practices ● Use ClickHouse version 19.16.10 or newer ● For HA you should have at least min.insync.replicas+1 brokers. ○ Typical scenario: 3 brokers, replication factor = 3, min.insync.replicas = 2 ● To consume your topic in parallel you need to have enough partitions (you can’t have more consumers than partitions, otherwise some of them will do nothing). You can try for example 2*num_of_consumers ● If you need to get ‘coordinates’ of consumed messages use virtual columns: ○ _topic, _partition, _timestamp, _key, _offset ○ Just use the in MV, w/o declaring in Engine=Kafka table
  • 21. © 2020, Altinity LTD Overall best practices ● When you have many Kafka tables - increase background_schedule_pool_size (monitor BackgroundSchedulePoolTask) ● If consuming performance is too low - don’t use num_consumers (keep it 1), but create a separate table with Engine=Kafka and MV streaming data to the same target. ● To set rdkafka options - add to <kafka> section in config.xml or preferably use a separate file in config.d/ ○ https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md
  • 22. © 2020, Altinity LTD ClickHouse Clusters and Kafka ● Best practice - every ClickHouse server consumes some partitions, and flushes rows to local ReplicatedMergeTree table. ● Flush to Distributed table is also possible ○ If you need to shard the data in ClickHouse according to some sharding key ● Chains of materialized view are possible but can be less reliable ○ inserts are not atomic, so on failure you can get ‘dirty’ state ○ Atomic MV chains are planned for the first half of 2020
  • 23. © 2020, Altinity LTD Rewind / fast-forward / replay ● Step 1: Detach kafka tables in clickhouse ● Step 2: kafka-consumer-groups.sh --bootstrap-server kafka:9092 --topic topic:0,1,2 --group id1 --reset-offsets --to-latest --execute ○ More samples: https://gist.github.com/filimonov/1646259d18b911d7a1e8745d6411c0cc ● Step: Attach kafka tables back See also configuration settings: <kafka> <auto_offset_reset>smallest</auto_offset_reset> </kafka>
  • 24. © 2020, Altinity LTD How batching from Kafka stream works Important settings: kafka_max_block_size, stream_poll_timeout_ms, stream_flush_interval_ms 1. Batch poll (time limit: stream_poll_timeout_ms 500ms, messages limit: kafka_max_block_size 65536) 2. Parse messages. If we have enough data (rows limit: kafka_max_block_size 65536) or reach time limit (stream_flush_interval_ms 7500ms) - flush it to target MV, if no - repeat step 1. 3. Commit happen after writing data to MV (commit after write = at-least-once) 4. On any error during that process kafka client is restarted (leading to rebalance - leave the group and get back in few seconds)
  • 25. © 2020, Altinity LTD Alternatives to the ClickHouse Kafka Engine
  • 26. © 2020, Altinity LTD Loading data via a client application Kafka ClickHouse Java Connector Home-built client
  • 27. © 2020, Altinity LTD Other approaches to consider ● If you like the Java Stack & use something from that stack already - you can stream Kafka topic to ClickHouse JDBC ○ Apache NiFi ○ Apache Storm ○ Kafka Streams ● A new entrant, not tested: https://github.com/housepower/clickhouse_sinker
  • 28. © 2020, Altinity LTD Kafka Feature Roadmap and Wrap-up
  • 29. © 2020, Altinity LTD Roadmap ● 2020 near-term Kafka improvements ○ Eliminate duplicates due to topic rebalancing ○ Filling key for inserts (to allow partitioning), also timestamps ○ Better error processing ○ Exactly once semantics ○ AVRO format ○ Introspection - system.kafka, metrics & events ● Long-term Kafka work ○ Fix performance issues including efficient consumer support ○ Support for other messaging systems (need to decide which ones) ○ Give us your thoughts! File issues on Github or contact Altinity directly if you have feature requests
  • 30. © 2020, Altinity LTD Thank you! Special Offer: Contact us for a 1-hour consultation Presenters: rhodges@altinity.com mfilimonov@altinity.com Visit us at: https://www.altinity.com Free Consultation: https://blog.altinity.com/offer