This session will focus on ClickHouse, an open-source column-oriented database management system, and its role in facilitating real-time analytics in 2024. As organizations demand faster insights from their large datasets, ClickHouse has emerged as a critical player in the open-source database landscape. This session aims to provide an in-depth look at the capabilities, challenges, and best practices in utilizing ClickHouse for real-time analytics.
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 2024 , Vietnam FOSSASIA '24.pptx.pdf
1. Unleashing Real-time Insights with ClickHouse:
Navigating the Landscape in 2024
ALKIN TEZUYSAL
FOSSASIA , Hanoi, Vietnam - Apr 2024
@ask_dba
@ChistaDATA Inc. 2024
2. Let’s get connected with Alkin first
Alkin Tezuysal - EVP - Global Services @chistadata
● Linkedin : https://www.linkedin.com/in/askdba/
Open Source Database Evangelist
● Previously PlanetScale, Percona and Pythian as Technical Manager, SRE, DBA
● Previously Enterprise DBA , Informix, Oracle, DB2 , SQL Server
@ask_dba
@ChistaDATA Inc. 2024
3. About ChistaDATA Inc.
Founded in 2021 by Shiv Iyer - CEO and Principal
Strong lineage, backed by leading investors
Focusing on ClickHouse infrastructure engineering and performance operations
What’s ClickHouse anyway?
Services and Products around dedicated DBaaS, Managed Services, Support and Consulting
www.chistadata.io www.chistadata.com
@ask_dba
@ChistaDATA Inc. 2024
4. ● Most Influential in Database Community 2022 - The Redgate 100
● MySQL Cookbook, 4th Edition 2022 - O'Reilly Media, Inc.
● MySQL Rockstar 2023 - Oracle (MySQL Community)
● Database Design and Modeling with PostgreSQL and MySQL 2024 - <Packt>
Recognitions
@ask_dba
@ChistaDATA Inc. 2024
5. Maritime Trivia
@ask_dba
@ChistaDATA Inc. 2024
What is the term for the process of turning a sailing vessel away from
the wind, allowing the sails to fill and propel the boat forward?
7. What is ClickHouse?
ClickHouse is;
● Open-source Apache 2.0
● Column-oriented
● Database management system that is engineered for high-speed analytics.
● Its columnar storage model and advanced compression enable real-time
analysis on large data volumes.
@ask_dba
@ChistaDATA Inc. 2024
12. The importance of real-time analytics
● Helps deliver on strategic imperatives
● Competitive advantage
● Improve efficiencies
● Enhance customer experience
● Increase revenues
@ask_dba
@ChistaDATA Inc. 2024
13. ClickHouse Highlights
● Efficient compression
○ Supports multiple compression codecs, such as LZ4 and ZSTD
● Vectorized Query Execution
○ Vectorized query execution processes data in batches, operating on multiple data
points with a single CPU instruction.
● CPU Efficiency
○ Full use of modern CPUs' capabilities, including SIMD (Single Instruction, Multiple
Data) instructions
● Scalability
○ Built-in horizontal sharding and replication.
● Rich Function Library
○ Built-in functions and operators for data transformation, filtering, and aggregation
● Geospatial Support, Materialized Views, Support for SQL Syntax
@ask_dba
@ChistaDATA Inc. 2024
15. ClickHouse Engine Family
● MergeTree The most universal and functional table engines for high-load
tasks.
● Log Lightweight engines with minimum functionality.
● Integration Engines Engines for communicating with other data storage and
processing systems.
@ask_dba
@ChistaDATA Inc. 2024
17. Sample integration with MySQL
mysql> desc customers ;
+--------------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------+-------------+------+-----+---------+-------+
| customer_id | varchar(45) | NO | PRI | NULL | |
| customer_unique_id | varchar(45) | NO | UNI | NULL | |
| customer_zip_code_prefix | int | YES | | NULL | |
| customer_city | varchar(25) | YES | | NULL | |
| customer_state | char(2) | YES | | NULL | |
+--------------------------+-------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
@ask_dba
@ChistaDATA Inc. 2024
18. Sample integration with MySQL
statement: CREATE TABLE olist.mysql_data
(
`customer_id` String,
`customer_unique_id` String,
`customer_zip_code_prefix` Nullable(Int32) DEFAULT NULL,
`customer_city` Nullable(String) DEFAULT NULL,
`customer_state` Nullable(String) DEFAULT NULL
)
ENGINE = MySQL('127.0.0.1:3306', 'olist', 'customers', 'root', '[HIDDEN]')
1 row in set. Elapsed: 0.001 sec.
@ask_dba
@ChistaDATA Inc. 2024
19. Load data to ClickHouse
:) INSERT INTO customers SELECT *
FROM mysql_data
Query id: f4e154ad-c6dd-497d-988e-d0d019319a53
Ok.
@ask_dba
@ChistaDATA Inc. 2024
20. Transferred table in ClickHouse
:) select count(*) from customers;
SELECT count(*)
FROM customers
Query id: dffacd95-a0ae-4027-b12b-dfa17d780e79
┌─count()─┐
1. │ 192016 │
└─────────┘
1 row in set. Elapsed: 0.008 sec.
@ask_dba
@ChistaDATA Inc. 2024
22. Use Case Ideas
● Analytics on denormalized tables
● Star Schema migration
● Time Series data ingestion via streaming
● Log data
● OLTP data archive
● Data Lake and Fabric solutions
● Observibility
@ask_dba
@ChistaDATA Inc. 2024
23. Streaming Data to Real Time Analytics
@ask_dba
@ChistaDATA Inc. 2024
25. Get started with clickhouse-local
$ curl https://clickhouse.com/ | sh
$ ./clickhouse local -q "SELECT * FROM 'customers.tsv'"
@ask_dba
@ChistaDATA Inc. 2024
26. Get started with brew on MacOS
$ brew install --cask clickhouse
$ clickhouse
ClickHouse local version 24.3.1.2672 (official build).
macbook-pro-4.local :) SELECT
name AS table_name,
formatReadableSize(total_bytes) AS size,
total_rows
FROM system.tables
WHERE database = 'olist'
ORDER BY total_bytes DESC;
SELECT
name AS table_name,
@ask_dba
@ChistaDATA Inc. 2024
27. Born to Sail, Forced to Work!
Catching winds
@svrubato
How to contribute
to community?
@ChistaDATA Inc. 2024
@ask_dba