SlideShare a Scribd company logo
1 of 33
Download to read offline
Your first ClickHouse
data warehouse
Robert Hodges - 2 December 2020
SF Bay Area ClickHouse Meetup
1
Presenter and Company Bio
www.altinity.com
Enterprise provider for ClickHouse, a
popular, open source data warehouse.
Community sponsor and major
committers to ClickHouse project.
Robert Hodges - Altinity CEO
30+ years on DBMS plus
virtualization and security. Using
Kubernetes since 2018.
2
Introducing
ClickHouse
Single binary
Understands SQL
Runs on bare metal to cloud
Stores data in columns
Parallel and vectorized execution
Scales to many petabytes
Is Open source (Apache 2.0)
ClickHouse is an open source data warehouse
ClickHouse Server
a b c d
And it’s really fast!
ClickHouse Server
a b c d
ClickHouse Server
a b c d
ClickHouse Server
a b c d
Installing ClickHouse goodness on Linux
# UBUNTU/DEBIAN INSTALL
sudo apt-get install apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 
--recv E0C56BD4
echo "deb https://repo.clickhouse.tech/deb/stable/ main/" | sudo tee 
/etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
sudo apt-get install -y clickhouse-server clickhouse-client
sudo systemctl start clickhouse-server
Debian
Packages
TarballsRPMs
ClickHouse goodness delivered by Docker
mkdir $HOME/clickhouse-data
docker run -d --name clickhouse-server 
--ulimit nofile=262144:262144 
--volume=$HOME/clickhouse-data:/var/lib/clickhouse 
-p 8123:8123 -p 9000:9000 
yandex/clickhouse-server
6
Persist data
Make ports visible
Make ClickHouse happy
YES!
● Yandex Managed Service for ClickHouse --
Runs in Yandex.Cloud
● Altinity.Cloud -- Runs in Amazon Public Cloud
Is there ClickHouse cloud goodness?
7
Where is the documentation?
8
https://clickhouse.tech/
Getting started
with app
development
10
First step: The ClickHouse Tutorial
10
https://clickhouse.tech/docs/en/getting-started/tutorial/
Second step: Design table(s) and load data
CREATE TABLE meetup.readings (
sensor_id Int32,
time DateTime,
date Date,
temperature Decimal(5,2)
)
Engine = MergeTree
PARTITION BY toYYYYMM(time)
ORDER BY (sensor_id, time);
Don’t stress about data types
Use MergeTree table types
Partition by month or day
Sort by “keys” to find dataLZ4 compression by default
Table
Part
Index Columns
Sparse index
Columns sorted
on ORDER BY
columns
Rows match
PARTITION BY
expression
Part
Index Columns
Part
Compressed
block
12
Your friend: the MergeTree table type
12
CSVWithNames
"sensor_id","time","date","temperature"
0,"2019-01-01 00:00:00","2019-01-01",43.31
0,"2019-01-01 00:01:00","2019-01-01",43.35
JSONEachRow
{"sensor_id":0,"time":"2019-01-01 00:00:00","date":"2019-01-01",...}
{"sensor_id":0,"time":"2019-01-01 00:01:00","date":"2019-01-01",...}
{"sensor_id":0,"time":"2019-01-01 00:02:00","date":"2019-01-01",...}
Popular formats for loading data
# Load CSV
cat readings.csv | 
clickhouse-client 
--query "INSERT INTO meetup.readings FORMAT CSVWithNames"
# Load JSON
cat readings.json | 
clickhouse-client --query "INSERT INTO meetup.readings
FORMAT JSONEachRow"
Loading through clickhouse-client
-- Load from a file function.
sudo mkdir -p /var/lib/clickhouse/user_files
sudo chmod 777 /var/lib/clickhouse/user_files
sudo cp readings.json /var/lib/clickhouse/user_files
clickhouse-client
pika :) INSERT INTO meetup.readings
SELECT *
FROM file('readings.json', 'JSONEachRow',
'sensor_id Int32, time DateTime, date Date, temperature
Decimal(5,2)')
Loading through table functions
-- Insert from S3
INSERT INTO meetup.readings
SELECT * FROM
s3('https://s3.us-east-1.amazonaws.com/altinity-data-1/readings.csv',
'CSVWithNames',
'sensor_id Int32, time DateTime, date Date, temperature
Decimal(5,2)')
NEW: loading data from S3 (20.8+)
17
Third Step: Go crazy with your own queries
17
https://clickhouse.tech/docs/en/sql-reference/statements/select/
But what about client libraries??
1818
Language Popular Drivers
C++ https://github.com/ClickHouse/clickhouse-cpp
Golang https://github.com/ClickHouse/clickhouse-go
Java https://github.com/ClickHouse/clickhouse-jdbc
ODBC https://github.com/ClickHouse/clickhouse-odbc
Python https://github.com/mymarilyn/clickhouse-driver
PHP and Javascript Use a library listed on ClickHouse.tech *or* roll your own using
the ClickHouse HTTP interface
ClickHouse
Database
self-defense
Database Choices
Row Store Column Store
“Data Warehouse”
a b c d e f g h i j k l m n o...
a b c d e f g h i j k l m n o...
a b c d e f g h i j k l m n o...
a b c d e f g h i j k l m n o...
a b c d e f g h i j k l m n o...
a b c d e f g h i j k l m n o...
MySQL: Row Store Access
Read row data serially
a b c d e f g h i j k l m n o p q r s t u v...
a b c d e f g h i j k l m n o p q r s t u v...
a b c d e f g h i j k l m n o p q r s t u v...
a b c d e f g h i j k l m n o p q r s t u v...
a b c d e f g h i j k l m n o p q r s t u v...
a b c d e f g h i j k l m n o p q r s t u v...
Column Store Access
Read compressed columns in parallel
There is no penalty for wide tables
“Pay” only for the columns you read
Compression makes data even smaller
Data
Type
Codec Compression
LowCardinality
(String)
(none) LZ4
UInt32 DoubleDelta ZSTD(1)
Optimize compression to reduce I/O!
CREATE TABLE billy.readings (
sensor_id Int32 Codec(DoubleDelta, ZSTD(1)),
time DateTime Codec(DoubleDelta, ZSTD(1)),
date ALIAS toDate(time),
temperature Decimal(5,2) Codec(T64, ZSTD(1))
)
Engine = MergeTree
PARTITION BY toYYYYMM(time)
ORDER BY (sensor_id, time);
Codec
Compression
Computed value
Query system.columns to see compression
3.22%
0.13%
3.34%
0.14%
43.8%
29.3%
Materialized views restructure/reduce data
readings
Table
Ingest
All sensor readings Daily max/min by sensor
readings_daily
AggregatingMergeTree
(Trigger)
readings_daily_mv
Materialized View
CREATE MATERIALIZED VIEW billy.readings_daily_mv
TO billy.readings_daily AS
SELECT sensor_id, date,
minState(temperature) as temp_min,
maxState(temperature) as temp_max
FROM billy.readings
GROUP BY sensor_id, date;
Size: 544GB
Rows: 500B
Size: 1.7GB
Rows: 347M
Materialized views function like indexes!
SELECT max(temp_max)
FROM billy.readings_daily
WHERE sensor_id = 55
┌─max(temp_max)─┐
│ 75.91 │
└───────────────┘
1 rows in set. Elapsed: 0.011 sec. Processed 180.22
thousand rows, 1.44 MB (15.86 million rows/s., 126.84
MB/s.)
ClickHouse performance tuning is different...
The bad news…
● No query optimizer
● No EXPLAIN PLAN
● May need to move [a lot
of] data for performance
The good news…
● No query optimizer!
● System log is great
● System tables are too
● Performance drivers are
simple: I/O and CPU
● Constantly improving
Your friend: the ClickHouse query log
clickhouse-client --send_logs_level=trace
sudo less 
/var/log/clickhouse-server/clickhouse-server.log
Return messages to
clickhouse-client
View all log
messages on server
Strengths and weaknesses of ClickHouse
(-) Lots of “small” lookups
(-) Lots of updates
(-) High concurrency
(-) Consistency critical
(+) Very long tables
(+) Very wide tables
(+) Open ended questions
(+) Lots of aggregates
OLTP
(“Online Transaction Processing”)
OLAP
(“Online Analytical Processing”)
ClickHouse >> MySQL for analytic queries
● Community docs on ClickHouse.tech
○ Everything Clickhouse
● ClickHouse Youtube Channel
○ Piles of community videos
● Altinity Blog
○ Lots of articles about ClickHouse usage
● Altinity Webinars
○ Webinars on all aspects of ClickHouse
● ClickHouse source code on Github
○ Check out tests for examples of detailed usage
More information and references
32
Thank you!
We’re hiring
ClickHouse:
https://github.com/ClickHouse/
ClickHouse
Documentation:
https://clickhouse.tech
Altinity Website:
https://www.altinity.com
33

More Related Content

What's hot

What's hot (20)

ClickHouse Keeper
ClickHouse KeeperClickHouse Keeper
ClickHouse Keeper
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
 
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareClickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and how
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
 
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
 
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UIData Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
 

Similar to Your first ClickHouse data warehouse

SH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptx
SH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptxSH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptx
SH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptx
MongoDB
 
SH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptx
SH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptxSH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptx
SH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptx
MongoDB
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
jbellis
 
扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区
yiditushe
 

Similar to Your first ClickHouse data warehouse (20)

MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...
 
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ library
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ libraryInterview with Anatoliy Kuznetsov, the author of BitMagic C++ library
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ library
 
SH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptx
SH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptxSH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptx
SH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptx
 
SH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptx
SH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptxSH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptx
SH 1 - SES 2 part 2 - Tel Aviv MDBlocal - Eliot Keynote.pptx
 
Cassandra, Modeling and Availability at AMUG
Cassandra, Modeling and Availability at AMUGCassandra, Modeling and Availability at AMUG
Cassandra, Modeling and Availability at AMUG
 
Design Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and Kiji
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
 
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl YeksigianC* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
 
The rise of json in rdbms land jab17
The rise of json in rdbms land jab17The rise of json in rdbms land jab17
The rise of json in rdbms land jab17
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区
 
Fotolog: Scaling the World's Largest Photo Blogging Community
Fotolog: Scaling the World's Largest Photo Blogging CommunityFotolog: Scaling the World's Largest Photo Blogging Community
Fotolog: Scaling the World's Largest Photo Blogging Community
 
DZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling WebinarDZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling Webinar
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB Performance
 
MongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big DataMongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big Data
 
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
Lab pratico per la progettazione di soluzioni MongoDB in ambito Internet of T...
 
Spark Streaming with Cassandra
Spark Streaming with CassandraSpark Streaming with Cassandra
Spark Streaming with Cassandra
 

More from Altinity Ltd

More from Altinity Ltd (20)

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
 

Recently uploaded

Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 

Recently uploaded (20)

GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 

Your first ClickHouse data warehouse

  • 1. Your first ClickHouse data warehouse Robert Hodges - 2 December 2020 SF Bay Area ClickHouse Meetup 1
  • 2. Presenter and Company Bio www.altinity.com Enterprise provider for ClickHouse, a popular, open source data warehouse. Community sponsor and major committers to ClickHouse project. Robert Hodges - Altinity CEO 30+ years on DBMS plus virtualization and security. Using Kubernetes since 2018. 2
  • 4. Single binary Understands SQL Runs on bare metal to cloud Stores data in columns Parallel and vectorized execution Scales to many petabytes Is Open source (Apache 2.0) ClickHouse is an open source data warehouse ClickHouse Server a b c d And it’s really fast! ClickHouse Server a b c d ClickHouse Server a b c d ClickHouse Server a b c d
  • 5. Installing ClickHouse goodness on Linux # UBUNTU/DEBIAN INSTALL sudo apt-get install apt-transport-https ca-certificates dirmngr sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4 echo "deb https://repo.clickhouse.tech/deb/stable/ main/" | sudo tee /etc/apt/sources.list.d/clickhouse.list sudo apt-get update sudo apt-get install -y clickhouse-server clickhouse-client sudo systemctl start clickhouse-server Debian Packages TarballsRPMs
  • 6. ClickHouse goodness delivered by Docker mkdir $HOME/clickhouse-data docker run -d --name clickhouse-server --ulimit nofile=262144:262144 --volume=$HOME/clickhouse-data:/var/lib/clickhouse -p 8123:8123 -p 9000:9000 yandex/clickhouse-server 6 Persist data Make ports visible Make ClickHouse happy
  • 7. YES! ● Yandex Managed Service for ClickHouse -- Runs in Yandex.Cloud ● Altinity.Cloud -- Runs in Amazon Public Cloud Is there ClickHouse cloud goodness? 7
  • 8. Where is the documentation? 8 https://clickhouse.tech/
  • 10. 10 First step: The ClickHouse Tutorial 10 https://clickhouse.tech/docs/en/getting-started/tutorial/
  • 11. Second step: Design table(s) and load data CREATE TABLE meetup.readings ( sensor_id Int32, time DateTime, date Date, temperature Decimal(5,2) ) Engine = MergeTree PARTITION BY toYYYYMM(time) ORDER BY (sensor_id, time); Don’t stress about data types Use MergeTree table types Partition by month or day Sort by “keys” to find dataLZ4 compression by default
  • 12. Table Part Index Columns Sparse index Columns sorted on ORDER BY columns Rows match PARTITION BY expression Part Index Columns Part Compressed block 12 Your friend: the MergeTree table type 12
  • 13. CSVWithNames "sensor_id","time","date","temperature" 0,"2019-01-01 00:00:00","2019-01-01",43.31 0,"2019-01-01 00:01:00","2019-01-01",43.35 JSONEachRow {"sensor_id":0,"time":"2019-01-01 00:00:00","date":"2019-01-01",...} {"sensor_id":0,"time":"2019-01-01 00:01:00","date":"2019-01-01",...} {"sensor_id":0,"time":"2019-01-01 00:02:00","date":"2019-01-01",...} Popular formats for loading data
  • 14. # Load CSV cat readings.csv | clickhouse-client --query "INSERT INTO meetup.readings FORMAT CSVWithNames" # Load JSON cat readings.json | clickhouse-client --query "INSERT INTO meetup.readings FORMAT JSONEachRow" Loading through clickhouse-client
  • 15. -- Load from a file function. sudo mkdir -p /var/lib/clickhouse/user_files sudo chmod 777 /var/lib/clickhouse/user_files sudo cp readings.json /var/lib/clickhouse/user_files clickhouse-client pika :) INSERT INTO meetup.readings SELECT * FROM file('readings.json', 'JSONEachRow', 'sensor_id Int32, time DateTime, date Date, temperature Decimal(5,2)') Loading through table functions
  • 16. -- Insert from S3 INSERT INTO meetup.readings SELECT * FROM s3('https://s3.us-east-1.amazonaws.com/altinity-data-1/readings.csv', 'CSVWithNames', 'sensor_id Int32, time DateTime, date Date, temperature Decimal(5,2)') NEW: loading data from S3 (20.8+)
  • 17. 17 Third Step: Go crazy with your own queries 17 https://clickhouse.tech/docs/en/sql-reference/statements/select/
  • 18. But what about client libraries?? 1818 Language Popular Drivers C++ https://github.com/ClickHouse/clickhouse-cpp Golang https://github.com/ClickHouse/clickhouse-go Java https://github.com/ClickHouse/clickhouse-jdbc ODBC https://github.com/ClickHouse/clickhouse-odbc Python https://github.com/mymarilyn/clickhouse-driver PHP and Javascript Use a library listed on ClickHouse.tech *or* roll your own using the ClickHouse HTTP interface
  • 20. Database Choices Row Store Column Store “Data Warehouse”
  • 21. a b c d e f g h i j k l m n o... a b c d e f g h i j k l m n o... a b c d e f g h i j k l m n o... a b c d e f g h i j k l m n o... a b c d e f g h i j k l m n o... a b c d e f g h i j k l m n o... MySQL: Row Store Access Read row data serially
  • 22. a b c d e f g h i j k l m n o p q r s t u v... a b c d e f g h i j k l m n o p q r s t u v... a b c d e f g h i j k l m n o p q r s t u v... a b c d e f g h i j k l m n o p q r s t u v... a b c d e f g h i j k l m n o p q r s t u v... a b c d e f g h i j k l m n o p q r s t u v... Column Store Access Read compressed columns in parallel
  • 23. There is no penalty for wide tables “Pay” only for the columns you read
  • 24. Compression makes data even smaller Data Type Codec Compression LowCardinality (String) (none) LZ4 UInt32 DoubleDelta ZSTD(1)
  • 25. Optimize compression to reduce I/O! CREATE TABLE billy.readings ( sensor_id Int32 Codec(DoubleDelta, ZSTD(1)), time DateTime Codec(DoubleDelta, ZSTD(1)), date ALIAS toDate(time), temperature Decimal(5,2) Codec(T64, ZSTD(1)) ) Engine = MergeTree PARTITION BY toYYYYMM(time) ORDER BY (sensor_id, time); Codec Compression Computed value
  • 26. Query system.columns to see compression 3.22% 0.13% 3.34% 0.14% 43.8% 29.3%
  • 27. Materialized views restructure/reduce data readings Table Ingest All sensor readings Daily max/min by sensor readings_daily AggregatingMergeTree (Trigger) readings_daily_mv Materialized View CREATE MATERIALIZED VIEW billy.readings_daily_mv TO billy.readings_daily AS SELECT sensor_id, date, minState(temperature) as temp_min, maxState(temperature) as temp_max FROM billy.readings GROUP BY sensor_id, date; Size: 544GB Rows: 500B Size: 1.7GB Rows: 347M
  • 28. Materialized views function like indexes! SELECT max(temp_max) FROM billy.readings_daily WHERE sensor_id = 55 ┌─max(temp_max)─┐ │ 75.91 │ └───────────────┘ 1 rows in set. Elapsed: 0.011 sec. Processed 180.22 thousand rows, 1.44 MB (15.86 million rows/s., 126.84 MB/s.)
  • 29. ClickHouse performance tuning is different... The bad news… ● No query optimizer ● No EXPLAIN PLAN ● May need to move [a lot of] data for performance The good news… ● No query optimizer! ● System log is great ● System tables are too ● Performance drivers are simple: I/O and CPU ● Constantly improving
  • 30. Your friend: the ClickHouse query log clickhouse-client --send_logs_level=trace sudo less /var/log/clickhouse-server/clickhouse-server.log Return messages to clickhouse-client View all log messages on server
  • 31. Strengths and weaknesses of ClickHouse (-) Lots of “small” lookups (-) Lots of updates (-) High concurrency (-) Consistency critical (+) Very long tables (+) Very wide tables (+) Open ended questions (+) Lots of aggregates OLTP (“Online Transaction Processing”) OLAP (“Online Analytical Processing”) ClickHouse >> MySQL for analytic queries
  • 32. ● Community docs on ClickHouse.tech ○ Everything Clickhouse ● ClickHouse Youtube Channel ○ Piles of community videos ● Altinity Blog ○ Lots of articles about ClickHouse usage ● Altinity Webinars ○ Webinars on all aspects of ClickHouse ● ClickHouse source code on Github ○ Check out tests for examples of detailed usage More information and references 32