SlideShare a Scribd company logo
1 of 36
Building an Analytic extension to MySQL with
ClickHouse
1
Vadim Tkachenko(Percona) and Kanthi Subramanian(Altinity)
2 March 2023
Who we are
Vadim Tkachenko
CTO Percona
Kanthi Subramanian
Open source contributor/Data
Engineer/Developer Advocate
2
©2023 Percona
MySQL
Strengths
- OLTP Database (Operational)
Handles up to 1mln transactions per second
- Thousands of concurrent transactions
3
©2023 Percona
MySQL is
good for
- 1. ACID transactions.
- 2. Excellent concurrency.
- 3. Very fast point lookups and short
transactions.
- 4. Excellent tooling for building OLTP
applications.
- It's very good for running interactive online
properties:
- - e-commerce
- - online gaming
- - social networks
4
©2023 Percona
Analytics
with MySQL
- Only for small data sets.
- Aggregation queries (GROUP BY) can be problematic
(slow) on 10mln+ rows
In summary: analyzing data over millions of small
transactions is not good use case for MySQL
Some examples (next slides):
5
©2023 Percona 6
Query comparison (MySQL/ClickHouse)
The number of flights delayed by more than 10 minutes,
grouped by the day of the week, for 2000-2008
SELECT DayOfWeek, count(*) AS c
FROM ontime_snapshot
WHERE DepDel15>10 AND Year>=2000 AND
Year<=2008
GROUP BY DayOfWeek
ORDER BY c DESC;
176mln rows to process
MySQL ClickHouse
573 Seconds (9
minutes 7 seconds)
0.5 seconds
©2023 Percona 7
7
Query comparison(MySQL/ClickHouse)
7
The number of flights delayed by more than 10 minutes,
grouped by the day of the week, for 2000-2008
SELECT Year, avg(DepDelay>10)*100
FROM ontime
GROUP BY Year
ORDER BY Year;
176mln rows to process
MySQL ClickHouse
240 Seconds (4
minutes)
0.674 seconds
©2023 Percona
What gives
such
difference ?
8
MySQL features:
storing data in rows
single-threaded queries,
optimization for high concurrency
are exactly the opposite of those needed to run analytic queries that compute
aggregates on large datasets.
ClickHouse is designed for analytic processing:
- stores data in columns
- has optimizations to minimize I/O
- computes aggregates very efficiently
- parallelized query processing
©2023 Percona 9
Why choose ClickHouse as a complement to
MySQL?
The number of flights delayed by more than 10 minutes,
grouped by the day of the week, for 2000-2008
Read all columns in row (MySQL) Read only selected columns
(ClickHouse)
©2023 Percona
Signs that MySQL needs
Analytic Help
10
Read all
columns
59 GB
(100%)
MySQL, hypothetical query
©2023 Percona
Signs that MySQL needs
Analytic Help
11
21 MB (.035%)
2.6 MB
(.0044%)
1.7 GB
(3%)
Read 3
columns Read 3
compressed
columns
Read 3
compressed
columns
over 8
threads
21 MB (.035%)
ClickHouse, the same query
©2023 Percona
Why is MySQL
a natural
complement
to
ClickHouse?
12
MySQL
Transactional processing
Fast single row updates
High Concurrency. MySQL
support large amount of
concurrent queries
ClickHouse
Does not support ACID
transactions
Updating single row is
problematic. ClickHouse will
need to read and updated a
lot of data
ClickHouse can use a lot of
resources for a single query.
Not good use case for
concurrent access
13
Leveraging Analytical
Benefits of
ClickHouse
● Identify Databases/Tables in
MySQL to be replicated
● Create schema/Databases in
ClickHouse
● Transfer Data from MySQL to
ClickHouse
https://github.com/Altinity/clickhouse-sink-connector
Fully wired, continuous replication
14
Table Engine(s)
Initial Dump/Load
MySQL ClickHouse
OLTP App Analytic App
MySQL
Binlog
Debezium
Altinity Sink
Connector
Kafka*
Event
Stream
*Including Pulsar and RedPanda
ReplacingMergeTree
Replication Setup
Validate Data
Setup CDC Replication
Initial Dump/Load
1
2
3
1. Initial Dump/Load
Why do we need custom load/dump tools?
● Data Types limits and Data Types are not the same for
MySQL and ClickHouse
Date Max MySQL(9999-12-31), Date CH(2299-12-31)
● Translate/Read MySQL schema and create ClickHouse
schema. (Identify PK, partition and translate to ORDER BY
in CH(RMT))
● Faster transfer, leverage existing MySQL and ClickHouse
tools.
1. Initial Dump/Load (MySQL Shell)
https://dev.mysql.com/blog-archive/mysql-shell-8-0-21-
speeding-up-the-dump-process/
https://blogs.oracle.com/mysql/post/mysql-shell-dump-load-
and-compression
1. Initial Dump/Load
MySQL Shell: Multi-Threaded, Split large tables to smaller chunks, Compression,
Speeds(upto 3GB/s).
Clickhouse Client: Multi-Threaded, read compressed data.
1. Initial Dump/Load
Install mysql-shell (JS)
mysqlsh -uroot -proot -hlocalhost -e "util.dump_tables('test', ['employees'],
'/tmp/employees_12');" --verbose
python db_load/clickhouse_loader.py --clickhouse_host localhost --
clickhouse_database $DATABASE --dump_dir $HOME/dbdumps/$DATABASE --
clickhouse_user root --clickhouse_password root --threads 4 --
mysql_source_database $DATABASE --mysqlshell
1. Initial Dump/Load
CREATE TABLE IF NOT EXISTS `employees_predated` (
`emp_no` int NOT NULL,
`birth_date` Date32 NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` Date32 NOT NULL,
`salary` bigint unsigned DEFAULT NULL,
`num_years` tinyint unsigned DEFAULT NULL,
`bonus` mediumint unsigned DEFAULT NULL,
`small_value` smallint unsigned DEFAULT NULL,
`int_value` int unsigned DEFAULT NULL,
`discount` bigint DEFAULT NULL,
`num_years_signed` tinyint DEFAULT NULL,
`bonus_signed` mediumint DEFAULT NULL,
`small_value_signed` smallint DEFAULT NULL,
`int_value_signed` int DEFAULT NULL,
`last_modified_date_time` DateTime64(0) DEFAULT NULL,
`last_access_time` String DEFAULT NULL,
`married_status` char(1) DEFAULT NULL,
`perDiemRate` decimal(30,12) DEFAULT NULL,
`hourlyRate` double DEFAULT NULL,
`jobDescription` text DEFAULT NULL,
`updated_time` String NULL ,
`bytes_date` longblob DEFAULT NULL,
`binary_test_column` varbinary(255) DEFAULT NULL,
`blob_med` mediumblob DEFAULT NULL,
`blob_new` blob DEFAULT NULL,
`_sign` Int8 DEFAULT 1,
`_version` UInt64 DEFAULT 0,
) ENGINE = ReplacingMergeTree(_version) ORDER BY (`emp_no`)
SETTINGS index_granularity = 8192;
CREATE TABLE `employees_predated` (
`emp_no` int NOT NULL,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` date NOT NULL,
`salary` bigint unsigned DEFAULT NULL,
`num_years` tinyint unsigned DEFAULT NULL,
`bonus` mediumint unsigned DEFAULT NULL,
`small_value` smallint unsigned DEFAULT NULL,
`int_value` int unsigned DEFAULT NULL,
`discount` bigint DEFAULT NULL,
`num_years_signed` tinyint DEFAULT NULL,
`bonus_signed` mediumint DEFAULT NULL,
`small_value_signed` smallint DEFAULT NULL,
`int_value_signed` int DEFAULT NULL,
`last_modified_date_time` datetime DEFAULT NULL,
`last_access_time` time DEFAULT NULL,
`married_status` char(1) DEFAULT NULL,
`perDiemRate` decimal(30,12) DEFAULT NULL,
`hourlyRate` double DEFAULT NULL,
`jobDescription` text,
`updated_time` timestamp NULL DEFAULT NULL,
`bytes_date` longblob,
`binary_test_column` varbinary(255) DEFAULT NULL,
`blob_med` mediumblob,
`blob_new` blob,
PRIMARY KEY (`emp_no`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_0900_ai_ci
/*!50100 PARTITION BY RANGE (`emp_no`)
(PARTITION p1 VALUES LESS THAN (1000) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
*/ |
MySQL
ClickHouse
2. Validate Data
Why is a basic count check not enough?
● Essential to validate the values, example decimal/floating
precision and datatype limits.
● Data Types are different between MySQL and ClickHouse.
Solution: md5 checksum of column data (Courtesy:
Sisense)
1. Take the MD5 of each column. Use a space for
NULL values.
2. Concatenate those results, and MD5 this result.
3. Split into 4 8-character hex strings.
4. Convert into 32-bit integers and sum.
python
db_compare/mysql_table_check
sum.py --mysql_host localhost --
mysql_user root --mysql_password
root --mysql_database menagerie
--tables_regex "^pet" --
debug_output
python
db_compare/clickhouse_table_c
hecksum.py --clickhouse_host
localhost --clickhouse_user root --
clickhouse_password root --
clickhouse_database menagerie --
tables_regex "^pet" --debug_output
diff out.pet.ch.txt out.pet.mysql.txt
| grep "<|>"
Credits: Arnaud
3. Setup CDC Replication
MySQL
binlog file: mysql.bin.00001
binlog position: 100002
Or
Gtid: 1233:223232323
Debezium
Altinity Sink
Connector
Kafka*
Event
Stream
ClickHouse
Setup Debezium to start from binlog file/position or Gtid
https://github.com/Altinity/clickhouse-sink-connector/blob/develop/doc/debezium_setup.md
Final step - Deploy
● Docker Compose (Debezium Strimzi, Sink Strimzi)
https://hub.docker.com/repository/docker/altinity/clickhouse-sink-connector
● Kubernetes (Docker images)
● JAR file
Simplified Architecture
MySQL
binlog file: mysql.bin.00001
binlog position: 100002
Or
Gtid: 1233:223232323
ClickHouse
Debezium
Altinity Sink
Connector
One executable
One service
Final step - Monitor
● Monitor Lag
● Connector Status
● Kafka monitoring
● CPU/Memory Stats
Challenges
- MySQL Master failover
- Schema Changes(DDL)
MySQL Master Replication
MySQL Master Failover
MySQL Master Failover - Snowflake ID
binlog timestamp
Alter Table support
30
ADD Column <col_name> varchar(1000)
NULL
ADD Column <col_name> Nullable(String)
ADD index type btree ADD index type minmax
MySQL ClickHouse
31
Replicating Schema Changes
32
Replicating Schema Changes
● Debezium does not provide events for all DDL Changes
● Complete DDL is only available in a separate topic(Not a
SinkRecord)
● Parallel Kafka workers might process messages out of order.
33
Replicating Schema Changes
Where can I get more information?
34
Altinity Sink Connector for ClickHouse
https://github.com/Altinity/clickhouse-sink-connector
https://github.com/ClickHouse/ClickHouse
https://github.com/mydumper/mydumper
35
Project roadmap and next Steps
- PostgreSQL, Mongo, SQL server support
- CH shards/replicas support
- Support Transactions.
36
Thank you!
Questions?
https://altinity.com https://percona.com

More Related Content

Similar to Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx

How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...Antonios Giannopoulos
 
Highload Perf Tuning
Highload Perf TuningHighload Perf Tuning
Highload Perf TuningHighLoad2009
 
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011bostonrb
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafkaNitin Kumar
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre FeaturesLinuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre FeaturesDave Stokes
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalabilityWim Godden
 
Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013Valeriy Kravchuk
 
RivieraJUG - MySQL Indexes and Histograms
RivieraJUG - MySQL Indexes and HistogramsRivieraJUG - MySQL Indexes and Histograms
RivieraJUG - MySQL Indexes and HistogramsFrederic Descamps
 
MySQL 8.0 New Features -- September 27th presentation for Open Source Summit
MySQL 8.0 New Features -- September 27th presentation for Open Source SummitMySQL 8.0 New Features -- September 27th presentation for Open Source Summit
MySQL 8.0 New Features -- September 27th presentation for Open Source SummitDave Stokes
 
Being HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeBeing HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeAman Kohli
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey J On The Beach
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin airKonstantine Krutiy
 
Caching and tuning fun for high scalability @ FOSDEM 2012
Caching and tuning fun for high scalability @ FOSDEM 2012Caching and tuning fun for high scalability @ FOSDEM 2012
Caching and tuning fun for high scalability @ FOSDEM 2012Wim Godden
 
Simplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterSimplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterHarsh Kevadia
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13Dave Gardner
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesAmazon Web Services
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network ProcessingRyousei Takano
 

Similar to Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx (20)

How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...
 
Highload Perf Tuning
Highload Perf TuningHighload Perf Tuning
Highload Perf Tuning
 
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre FeaturesLinuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
 
Perf Tuning Short
Perf Tuning ShortPerf Tuning Short
Perf Tuning Short
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013
 
RivieraJUG - MySQL Indexes and Histograms
RivieraJUG - MySQL Indexes and HistogramsRivieraJUG - MySQL Indexes and Histograms
RivieraJUG - MySQL Indexes and Histograms
 
MySQL 8.0 New Features -- September 27th presentation for Open Source Summit
MySQL 8.0 New Features -- September 27th presentation for Open Source SummitMySQL 8.0 New Features -- September 27th presentation for Open Source Summit
MySQL 8.0 New Features -- September 27th presentation for Open Source Summit
 
Being HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeBeing HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on Purpose
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin air
 
Caching and tuning fun for high scalability @ FOSDEM 2012
Caching and tuning fun for high scalability @ FOSDEM 2012Caching and tuning fun for high scalability @ FOSDEM 2012
Caching and tuning fun for high scalability @ FOSDEM 2012
 
Simplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterSimplified Data Processing On Large Cluster
Simplified Data Processing On Large Cluster
 
Intro to Databases
Intro to DatabasesIntro to Databases
Intro to Databases
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 

More from Altinity Ltd

Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Altinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceAltinity Ltd
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfAltinity Ltd
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfAltinity Ltd
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Altinity Ltd
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Altinity Ltd
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfAltinity Ltd
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsAltinity Ltd
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAltinity Ltd
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache PinotAltinity Ltd
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Ltd
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...Altinity Ltd
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfAltinity Ltd
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...Altinity Ltd
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...Altinity Ltd
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...Altinity Ltd
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...Altinity Ltd
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...Altinity Ltd
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfAltinity Ltd
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...Altinity Ltd
 

More from Altinity Ltd (20)

Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
 

Recently uploaded

RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx

  • 1. Building an Analytic extension to MySQL with ClickHouse 1 Vadim Tkachenko(Percona) and Kanthi Subramanian(Altinity) 2 March 2023
  • 2. Who we are Vadim Tkachenko CTO Percona Kanthi Subramanian Open source contributor/Data Engineer/Developer Advocate 2
  • 3. ©2023 Percona MySQL Strengths - OLTP Database (Operational) Handles up to 1mln transactions per second - Thousands of concurrent transactions 3
  • 4. ©2023 Percona MySQL is good for - 1. ACID transactions. - 2. Excellent concurrency. - 3. Very fast point lookups and short transactions. - 4. Excellent tooling for building OLTP applications. - It's very good for running interactive online properties: - - e-commerce - - online gaming - - social networks 4
  • 5. ©2023 Percona Analytics with MySQL - Only for small data sets. - Aggregation queries (GROUP BY) can be problematic (slow) on 10mln+ rows In summary: analyzing data over millions of small transactions is not good use case for MySQL Some examples (next slides): 5
  • 6. ©2023 Percona 6 Query comparison (MySQL/ClickHouse) The number of flights delayed by more than 10 minutes, grouped by the day of the week, for 2000-2008 SELECT DayOfWeek, count(*) AS c FROM ontime_snapshot WHERE DepDel15>10 AND Year>=2000 AND Year<=2008 GROUP BY DayOfWeek ORDER BY c DESC; 176mln rows to process MySQL ClickHouse 573 Seconds (9 minutes 7 seconds) 0.5 seconds
  • 7. ©2023 Percona 7 7 Query comparison(MySQL/ClickHouse) 7 The number of flights delayed by more than 10 minutes, grouped by the day of the week, for 2000-2008 SELECT Year, avg(DepDelay>10)*100 FROM ontime GROUP BY Year ORDER BY Year; 176mln rows to process MySQL ClickHouse 240 Seconds (4 minutes) 0.674 seconds
  • 8. ©2023 Percona What gives such difference ? 8 MySQL features: storing data in rows single-threaded queries, optimization for high concurrency are exactly the opposite of those needed to run analytic queries that compute aggregates on large datasets. ClickHouse is designed for analytic processing: - stores data in columns - has optimizations to minimize I/O - computes aggregates very efficiently - parallelized query processing
  • 9. ©2023 Percona 9 Why choose ClickHouse as a complement to MySQL? The number of flights delayed by more than 10 minutes, grouped by the day of the week, for 2000-2008 Read all columns in row (MySQL) Read only selected columns (ClickHouse)
  • 10. ©2023 Percona Signs that MySQL needs Analytic Help 10 Read all columns 59 GB (100%) MySQL, hypothetical query
  • 11. ©2023 Percona Signs that MySQL needs Analytic Help 11 21 MB (.035%) 2.6 MB (.0044%) 1.7 GB (3%) Read 3 columns Read 3 compressed columns Read 3 compressed columns over 8 threads 21 MB (.035%) ClickHouse, the same query
  • 12. ©2023 Percona Why is MySQL a natural complement to ClickHouse? 12 MySQL Transactional processing Fast single row updates High Concurrency. MySQL support large amount of concurrent queries ClickHouse Does not support ACID transactions Updating single row is problematic. ClickHouse will need to read and updated a lot of data ClickHouse can use a lot of resources for a single query. Not good use case for concurrent access
  • 13. 13 Leveraging Analytical Benefits of ClickHouse ● Identify Databases/Tables in MySQL to be replicated ● Create schema/Databases in ClickHouse ● Transfer Data from MySQL to ClickHouse https://github.com/Altinity/clickhouse-sink-connector
  • 14. Fully wired, continuous replication 14 Table Engine(s) Initial Dump/Load MySQL ClickHouse OLTP App Analytic App MySQL Binlog Debezium Altinity Sink Connector Kafka* Event Stream *Including Pulsar and RedPanda ReplacingMergeTree
  • 15. Replication Setup Validate Data Setup CDC Replication Initial Dump/Load 1 2 3
  • 16. 1. Initial Dump/Load Why do we need custom load/dump tools? ● Data Types limits and Data Types are not the same for MySQL and ClickHouse Date Max MySQL(9999-12-31), Date CH(2299-12-31) ● Translate/Read MySQL schema and create ClickHouse schema. (Identify PK, partition and translate to ORDER BY in CH(RMT)) ● Faster transfer, leverage existing MySQL and ClickHouse tools.
  • 17. 1. Initial Dump/Load (MySQL Shell) https://dev.mysql.com/blog-archive/mysql-shell-8-0-21- speeding-up-the-dump-process/ https://blogs.oracle.com/mysql/post/mysql-shell-dump-load- and-compression
  • 18. 1. Initial Dump/Load MySQL Shell: Multi-Threaded, Split large tables to smaller chunks, Compression, Speeds(upto 3GB/s). Clickhouse Client: Multi-Threaded, read compressed data.
  • 19. 1. Initial Dump/Load Install mysql-shell (JS) mysqlsh -uroot -proot -hlocalhost -e "util.dump_tables('test', ['employees'], '/tmp/employees_12');" --verbose python db_load/clickhouse_loader.py --clickhouse_host localhost -- clickhouse_database $DATABASE --dump_dir $HOME/dbdumps/$DATABASE -- clickhouse_user root --clickhouse_password root --threads 4 -- mysql_source_database $DATABASE --mysqlshell
  • 20. 1. Initial Dump/Load CREATE TABLE IF NOT EXISTS `employees_predated` ( `emp_no` int NOT NULL, `birth_date` Date32 NOT NULL, `first_name` varchar(14) NOT NULL, `last_name` varchar(16) NOT NULL, `gender` enum('M','F') NOT NULL, `hire_date` Date32 NOT NULL, `salary` bigint unsigned DEFAULT NULL, `num_years` tinyint unsigned DEFAULT NULL, `bonus` mediumint unsigned DEFAULT NULL, `small_value` smallint unsigned DEFAULT NULL, `int_value` int unsigned DEFAULT NULL, `discount` bigint DEFAULT NULL, `num_years_signed` tinyint DEFAULT NULL, `bonus_signed` mediumint DEFAULT NULL, `small_value_signed` smallint DEFAULT NULL, `int_value_signed` int DEFAULT NULL, `last_modified_date_time` DateTime64(0) DEFAULT NULL, `last_access_time` String DEFAULT NULL, `married_status` char(1) DEFAULT NULL, `perDiemRate` decimal(30,12) DEFAULT NULL, `hourlyRate` double DEFAULT NULL, `jobDescription` text DEFAULT NULL, `updated_time` String NULL , `bytes_date` longblob DEFAULT NULL, `binary_test_column` varbinary(255) DEFAULT NULL, `blob_med` mediumblob DEFAULT NULL, `blob_new` blob DEFAULT NULL, `_sign` Int8 DEFAULT 1, `_version` UInt64 DEFAULT 0, ) ENGINE = ReplacingMergeTree(_version) ORDER BY (`emp_no`) SETTINGS index_granularity = 8192; CREATE TABLE `employees_predated` ( `emp_no` int NOT NULL, `birth_date` date NOT NULL, `first_name` varchar(14) NOT NULL, `last_name` varchar(16) NOT NULL, `gender` enum('M','F') NOT NULL, `hire_date` date NOT NULL, `salary` bigint unsigned DEFAULT NULL, `num_years` tinyint unsigned DEFAULT NULL, `bonus` mediumint unsigned DEFAULT NULL, `small_value` smallint unsigned DEFAULT NULL, `int_value` int unsigned DEFAULT NULL, `discount` bigint DEFAULT NULL, `num_years_signed` tinyint DEFAULT NULL, `bonus_signed` mediumint DEFAULT NULL, `small_value_signed` smallint DEFAULT NULL, `int_value_signed` int DEFAULT NULL, `last_modified_date_time` datetime DEFAULT NULL, `last_access_time` time DEFAULT NULL, `married_status` char(1) DEFAULT NULL, `perDiemRate` decimal(30,12) DEFAULT NULL, `hourlyRate` double DEFAULT NULL, `jobDescription` text, `updated_time` timestamp NULL DEFAULT NULL, `bytes_date` longblob, `binary_test_column` varbinary(255) DEFAULT NULL, `blob_med` mediumblob, `blob_new` blob, PRIMARY KEY (`emp_no`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci /*!50100 PARTITION BY RANGE (`emp_no`) (PARTITION p1 VALUES LESS THAN (1000) ENGINE = InnoDB, PARTITION p2 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ | MySQL ClickHouse
  • 21. 2. Validate Data Why is a basic count check not enough? ● Essential to validate the values, example decimal/floating precision and datatype limits. ● Data Types are different between MySQL and ClickHouse. Solution: md5 checksum of column data (Courtesy: Sisense) 1. Take the MD5 of each column. Use a space for NULL values. 2. Concatenate those results, and MD5 this result. 3. Split into 4 8-character hex strings. 4. Convert into 32-bit integers and sum. python db_compare/mysql_table_check sum.py --mysql_host localhost -- mysql_user root --mysql_password root --mysql_database menagerie --tables_regex "^pet" -- debug_output python db_compare/clickhouse_table_c hecksum.py --clickhouse_host localhost --clickhouse_user root -- clickhouse_password root -- clickhouse_database menagerie -- tables_regex "^pet" --debug_output diff out.pet.ch.txt out.pet.mysql.txt | grep "<|>" Credits: Arnaud
  • 22. 3. Setup CDC Replication MySQL binlog file: mysql.bin.00001 binlog position: 100002 Or Gtid: 1233:223232323 Debezium Altinity Sink Connector Kafka* Event Stream ClickHouse Setup Debezium to start from binlog file/position or Gtid https://github.com/Altinity/clickhouse-sink-connector/blob/develop/doc/debezium_setup.md
  • 23. Final step - Deploy ● Docker Compose (Debezium Strimzi, Sink Strimzi) https://hub.docker.com/repository/docker/altinity/clickhouse-sink-connector ● Kubernetes (Docker images) ● JAR file
  • 24. Simplified Architecture MySQL binlog file: mysql.bin.00001 binlog position: 100002 Or Gtid: 1233:223232323 ClickHouse Debezium Altinity Sink Connector One executable One service
  • 25. Final step - Monitor ● Monitor Lag ● Connector Status ● Kafka monitoring ● CPU/Memory Stats
  • 26. Challenges - MySQL Master failover - Schema Changes(DDL)
  • 29. MySQL Master Failover - Snowflake ID binlog timestamp
  • 30. Alter Table support 30 ADD Column <col_name> varchar(1000) NULL ADD Column <col_name> Nullable(String) ADD index type btree ADD index type minmax MySQL ClickHouse
  • 32. 32 Replicating Schema Changes ● Debezium does not provide events for all DDL Changes ● Complete DDL is only available in a separate topic(Not a SinkRecord) ● Parallel Kafka workers might process messages out of order.
  • 34. Where can I get more information? 34 Altinity Sink Connector for ClickHouse https://github.com/Altinity/clickhouse-sink-connector https://github.com/ClickHouse/ClickHouse https://github.com/mydumper/mydumper
  • 35. 35 Project roadmap and next Steps - PostgreSQL, Mongo, SQL server support - CH shards/replicas support - Support Transactions.

Editor's Notes

  1. Experience deploying to customers and the tools we have developed in the process. It's a complicated set of steps, it will be easier to automate the entire process. Create schema/databases -> we have scripts for the initial load that simplifies this process, and sink connector can also auto create tables. Complete suite of tools to simplify the process end to end.
  2. Existing data in MySQL might be big, need a solution that will be fast to do the Initial transfer. (CH needs to be in-sync) End to End solution for transferring data from MySQL to ClickHouse for Production Deployments. Debezium timeout(STATEMENT execution timeout). Source DB might have limited permissions. You might not have permission to perform OUTFILE.
  3. Step 1: Perform a dump of data from MySQL and load it into ClickHouse. Debezium initial snapshot might not be faster. Step 2: After the dump is loaded, validate the data. Step 3: Setup CDC replication using Debezium and Altinity sink connector.
  4. Debezium provides initial snapshotting, but it’s slow. Debezium load times very slow. MAX_EXECUTION_TIMEOUT
  5. Debezium provides initial snapshotting, but it’s slow. Mysqlsh requires a PK, if PK is not present, it does not parallelize and do not provide chunking capabilities.
  6. Debezium provides initial snapshotting, but it’s slow. Mysql shell uses zstd compression standard by default. –threads option provides parallelism.
  7. Debezium provides initial snapshotting, but it’s slow. Mysql shell uses zstd compression standard by default. –threads option provides parallelism. Clickhouse_loader creates CH schema and adds version and sign columns for UPDATES/DELETES.
  8. Debezium provides initial snapshotting, but it’s slow. Mysql shell uses zstd compression standard by default. –threads option provides parallelism. Clickhouse_loader creates CH schema and adds version and sign columns for UPDATES/DELETES.
  9. Debezium provides initial snapshotting, but it’s slow. Compare results of the aggregation table that drives your dashboard. Sales numbers have to be accurate.
  10. Debezium provides initial snapshotting, but it’s slow. Different environments We also maintain images for Debezium/Strimzi and Sink/Strimzi
  11. Debezium provides initial snapshotting, but it’s slow. Different environments We also maintain images for Debezium/Strimzi and Sink/Strimzi
  12. Setup Alerts if connectors are down. Setup Alerts when there is a lag. Setup Alerts when there are errors. We also bundle the debezium dashboard and the kafka dashboard.
  13. Co-ordination is Key! Tradeoff between Parallelism and Consistency.
  14. Events: Truncate table.
  15. Events: Truncate table.
  16. Events: Truncate table.