SlideShare a Scribd company logo
ClickHouse
Capacity Planning
Methodology for OLAP Workloads
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 1/22
Intro
Cloudflare is a big and enthusiastic user of Clickhouse.
- 1PB of data per day going in (true in one sense false in many others)
- Internal and user facing analytics
- Operational support, data “science”, reports, etc.
Started out with a single small cluster almost 3 years ago. Grew “organically” into
a monster. Have been breaking it up into individual workloads and setting up
smaller clusters dedicated to specific products.
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 2/22
Purpose of capacity planning
1. Meet current needs
2. Know how to meet future needs (“100x” exercise)
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 3/22
Specific use case
- User facing analytics (“how many visitors” dashboard) for a product
- HTTP type logs 750Kops @ 500B compressed per record coming in
- On prem HDD machines 256GB 100TB RAID 0
- Clickhouse server version 19.1.16 revision 54413
Decide:
- P99 single SELECT query 1s
- 150 queries /s
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 4/22
tl;dr
- It is impossible to calculate cluster capacity on paper
- The only way to do it is by running real queries on real data
- But it can be done on a single host and then extrapolated
- Real data must be used in benchmarks
- Time based limits and quotas are effective
- Clickhouse documentation is good and the defaults are sane
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 5/22
Process
1. Benchmark the hardware
2. Max out single host using real data
3. Iterate on primary and partitioning key design
a. Sharding strategy comes here if sharding unavoidable
4. Max out the cluster until it is too slow
5. Put in per-host limits and time based quotas
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 6/22
Benchmark the hardware
- Get rough idea of individual host performance (orders of magnitude)
- How long to fill a disk? How long to read it all? DO THE MATH! 30h to read 100TB ok?
- How does it degrade with disk full or running out of iops?
- Make sure all hosts roughly the same (compare relative performance)
- Slowest node determines max speed of the cluster (especially for Distributed engine)
- One node faster than the rest ok; one node slower not ok
- No need to simulate exact CH disk access patterns
We ended up settling on a simple test using fio.
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 7/22
Max out INSERTs
- We use the HTTP interface with data formatted as RowBinary
- No performance advantage to using the “native” protocol
- No significant CPU hit on CH side due to sorting
- 1B going in != 1B on disk
- Observe disk use and calculate max possible retention
- If using “made up” data consider how it compresses
- If you can’t keep up with INSERTs then you must shard
For this use case, it turned out that single host can handle all the INSERTs no
problem (batch size 2M), so no need to shard (split the input set into sub-sets).
Follow instructions in CH documentation.
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 8/22
Max out SELECTs
- SELECT speed depends on how many bytes need to be read from disk. The
number of part blocks that need to be read from disk is determined by:
- The primary and partitioning keys
- Number of columns SELECTed
- Index granularity
- Orders of magnitude difference for:
- Same query with different key (big user vs small user, 1 hour vs 1 month)
- Same key with different columns (count distinct HTTP status codes vs count distinct URLs)
- Throughput is a function of latency and parallelism (Little’s Law)
- The only way to know is to observe it
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 9/22
Max out SELECTs
Is it better to process 10 queries in parallel with each query taking 1s, or 20
queries in parallel with each query taking 2s? Latency vs throughput. Little’s Law.
- Get hold of a representative set of queries
- Get hold of a representative set of keys (user, time)
- Iterate test runs increasing parallelism until optimum found
For us the “sweet spot” turned out to be parallelism 25 resulting in p95 query
latency of 465ms. So that is 50 queries per second per host.
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 10/22
Max out SELECTs
Primary and partitioning keys need to be aligned with use case.
- Primary key: (user, time)
- Partitioning key: (time,)
Exercise: consider using the time record was inserted into clickhouse:
- Less merge churn on inserts
- Deterministic SELECTs
- Results skewed by late arriving data (but churn reduced)
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 11/22
Max out SELECTs
- Clickhouse stores data in “parts” (files on disk)
- Each column has its own set of parts
- The index (always stored in RAM) maps primary key to record offsets
- “Marks” map record offsets to byte offsets in “parts”
To find a record, Clickhouse looks up the key in the index, then looks up the byte
offset in mark files for each column. Keeping the marks in RAM (“mark cache”)
makes a big difference. Marks for 70TB of data take up 70GB (look it up in
filesystem). The more data the more marks. The more columns the more marks.
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 12/22
Decide on sharding (do not shard if at all possible!)
This is a huge topic for separate presentation, but...
- Naive spreads all data across shards
- Need Distributed engine to query
- Max SELECT speed determined by slowest host
- Parallelism flat as hosts added to cluster
- Keyed puts all records for given key into the same shard
- Need additional logic for INSERTs
- Danger of imbalance (Pareto distribution)
- Key selection determines layout: user? Time?
- 2-level sharding combines naive with keyed: a cluster of clusters
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 13/22
Build the cluster
- Shards for INSERTs, replicas for SELECTs
- Remember things break
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 14/22
Max out the cluster until it is too slow
Experiment with variety of loads over period of days.
- Clickhouse degrades gracefully (hard to break)
- Adding replicas increases SELECT performance linearly
- Interaction between INSERT and SELECT must be established empirically
- Invested people REALLY RELUCTANT to break things
- Test not just failure, but also recovery (throttle recovery?)
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 15/22
Put in per-host limits and quotas
User / profile limits set upper boundaries on individual queries (“up to 10M rows
per query”). Quotas set cumulative limits per time period (“up to 10GB per
minute”).
- Get the system as hot as you are willing to ever have it in production
- Run it like that for a good while
- Collect query execution time from system.query_log
- Set limits and quotas accordingly
- Separate INSERTs from SELECTs
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 16/22
Put in per-host limits and quotas
User / profile limits set upper boundaries on individual queries.
- Use max_execution_time to prevent runaway queries
- Be aggressive; analyze the data set and query patterns
- Having upper bound to query time makes request rate limiting possible
- Use max_rows_to_read to short circuit “impossible” queries
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 17/22
Put in per-host limits and quotas
Quotas set cumulative limits per time period. Time based quotas (“in 60s you can
spend up to 850s processing queries for user A and 150s processing queries for
the user B”) do not “care” about the reason the quota was exceeded, which makes
them very general in application.
- Execution time is the final metric (things are either too slow or fast enough)
- Do not care why it is slow (traffic spike; broken disk; TOR switch bad)
- Slowest node taken out of rotation first (while byte-based penalizes fastest)
We ended up with 2 users: “api” and “inserter”.
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 18/22
After going live
If limits and quotas have been set correctly, you can “walk away” as users add
columns and queries (if you don’t care about end user experience).
- The only “hard” limit is running out of disk space
- But by that time it is likely way too late
- Repeat the capacity planning exercise for each addition (column, query)
- But you can do it on single host
- Multiple clusters for long retention (q1 cluster, q2 cluster, etc)
- Limits and quotas are there to prevent catastrophes only
- You need layered access control and rate limiting
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 19/22
Things I try to remember
Given my RDBMS background, I try to remember, in no particular order:
- It is about bytes not records
- Trivial change to a SELECT query can make it MILLION TIMES SLOWER
- Index construction is paramount
- Execution time is a great top-level metric
- You can’t delete things
- Clickhouse is very well written by really smart people
- There is no magic (cough Distributed engine)
- Unreasonable to expect end user to understand any of this (“but it is SQL”)
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 20/22
Recap
1. Remove all constraints
2. Empirically establish optimal parameters for the given work load
a. The big thing is you can do it on a single host
3. Protect these parameters with limits and quotas
Devote a million lifetimes to:
- Study of primary key construction
- Design of infinitely scalable bi-level sharding schemes
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 21/22
Thank you!
2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 22/22

More Related Content

What's hot

What's hot (20)

ClickHouse Keeper
ClickHouse KeeperClickHouse Keeper
ClickHouse Keeper
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
ClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic Continues
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEODangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
Size Matters-Best Practices for Trillion Row Datasets on ClickHouse-2202-08-1...
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and how
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
 

Similar to Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare

In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)
Chinmay Kulkarni
 
Design and Implementation of a High- Performance Distributed Web Crawler
Design and Implementation of a High- Performance Distributed Web CrawlerDesign and Implementation of a High- Performance Distributed Web Crawler
Design and Implementation of a High- Performance Distributed Web Crawler
George Ang
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 

Similar to Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare (20)

In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)
 
Compare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL ServerCompare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL Server
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1
 
advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file System
 
Design and Implementation of a High- Performance Distributed Web Crawler
Design and Implementation of a High- Performance Distributed Web CrawlerDesign and Implementation of a High- Performance Distributed Web Crawler
Design and Implementation of a High- Performance Distributed Web Crawler
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Gluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & TricksGluster for Geeks: Performance Tuning Tips & Tricks
Gluster for Geeks: Performance Tuning Tips & Tricks
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
Dba tuning
Dba tuningDba tuning
Dba tuning
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
Instrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with EnvoyInstrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with Envoy
 
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
 

More from Altinity Ltd

More from Altinity Ltd (20)

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 

Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare

  • 1. ClickHouse Capacity Planning Methodology for OLAP Workloads 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 1/22
  • 2. Intro Cloudflare is a big and enthusiastic user of Clickhouse. - 1PB of data per day going in (true in one sense false in many others) - Internal and user facing analytics - Operational support, data “science”, reports, etc. Started out with a single small cluster almost 3 years ago. Grew “organically” into a monster. Have been breaking it up into individual workloads and setting up smaller clusters dedicated to specific products. 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 2/22
  • 3. Purpose of capacity planning 1. Meet current needs 2. Know how to meet future needs (“100x” exercise) 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 3/22
  • 4. Specific use case - User facing analytics (“how many visitors” dashboard) for a product - HTTP type logs 750Kops @ 500B compressed per record coming in - On prem HDD machines 256GB 100TB RAID 0 - Clickhouse server version 19.1.16 revision 54413 Decide: - P99 single SELECT query 1s - 150 queries /s 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 4/22
  • 5. tl;dr - It is impossible to calculate cluster capacity on paper - The only way to do it is by running real queries on real data - But it can be done on a single host and then extrapolated - Real data must be used in benchmarks - Time based limits and quotas are effective - Clickhouse documentation is good and the defaults are sane 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 5/22
  • 6. Process 1. Benchmark the hardware 2. Max out single host using real data 3. Iterate on primary and partitioning key design a. Sharding strategy comes here if sharding unavoidable 4. Max out the cluster until it is too slow 5. Put in per-host limits and time based quotas 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 6/22
  • 7. Benchmark the hardware - Get rough idea of individual host performance (orders of magnitude) - How long to fill a disk? How long to read it all? DO THE MATH! 30h to read 100TB ok? - How does it degrade with disk full or running out of iops? - Make sure all hosts roughly the same (compare relative performance) - Slowest node determines max speed of the cluster (especially for Distributed engine) - One node faster than the rest ok; one node slower not ok - No need to simulate exact CH disk access patterns We ended up settling on a simple test using fio. 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 7/22
  • 8. Max out INSERTs - We use the HTTP interface with data formatted as RowBinary - No performance advantage to using the “native” protocol - No significant CPU hit on CH side due to sorting - 1B going in != 1B on disk - Observe disk use and calculate max possible retention - If using “made up” data consider how it compresses - If you can’t keep up with INSERTs then you must shard For this use case, it turned out that single host can handle all the INSERTs no problem (batch size 2M), so no need to shard (split the input set into sub-sets). Follow instructions in CH documentation. 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 8/22
  • 9. Max out SELECTs - SELECT speed depends on how many bytes need to be read from disk. The number of part blocks that need to be read from disk is determined by: - The primary and partitioning keys - Number of columns SELECTed - Index granularity - Orders of magnitude difference for: - Same query with different key (big user vs small user, 1 hour vs 1 month) - Same key with different columns (count distinct HTTP status codes vs count distinct URLs) - Throughput is a function of latency and parallelism (Little’s Law) - The only way to know is to observe it 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 9/22
  • 10. Max out SELECTs Is it better to process 10 queries in parallel with each query taking 1s, or 20 queries in parallel with each query taking 2s? Latency vs throughput. Little’s Law. - Get hold of a representative set of queries - Get hold of a representative set of keys (user, time) - Iterate test runs increasing parallelism until optimum found For us the “sweet spot” turned out to be parallelism 25 resulting in p95 query latency of 465ms. So that is 50 queries per second per host. 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 10/22
  • 11. Max out SELECTs Primary and partitioning keys need to be aligned with use case. - Primary key: (user, time) - Partitioning key: (time,) Exercise: consider using the time record was inserted into clickhouse: - Less merge churn on inserts - Deterministic SELECTs - Results skewed by late arriving data (but churn reduced) 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 11/22
  • 12. Max out SELECTs - Clickhouse stores data in “parts” (files on disk) - Each column has its own set of parts - The index (always stored in RAM) maps primary key to record offsets - “Marks” map record offsets to byte offsets in “parts” To find a record, Clickhouse looks up the key in the index, then looks up the byte offset in mark files for each column. Keeping the marks in RAM (“mark cache”) makes a big difference. Marks for 70TB of data take up 70GB (look it up in filesystem). The more data the more marks. The more columns the more marks. 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 12/22
  • 13. Decide on sharding (do not shard if at all possible!) This is a huge topic for separate presentation, but... - Naive spreads all data across shards - Need Distributed engine to query - Max SELECT speed determined by slowest host - Parallelism flat as hosts added to cluster - Keyed puts all records for given key into the same shard - Need additional logic for INSERTs - Danger of imbalance (Pareto distribution) - Key selection determines layout: user? Time? - 2-level sharding combines naive with keyed: a cluster of clusters 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 13/22
  • 14. Build the cluster - Shards for INSERTs, replicas for SELECTs - Remember things break 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 14/22
  • 15. Max out the cluster until it is too slow Experiment with variety of loads over period of days. - Clickhouse degrades gracefully (hard to break) - Adding replicas increases SELECT performance linearly - Interaction between INSERT and SELECT must be established empirically - Invested people REALLY RELUCTANT to break things - Test not just failure, but also recovery (throttle recovery?) 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 15/22
  • 16. Put in per-host limits and quotas User / profile limits set upper boundaries on individual queries (“up to 10M rows per query”). Quotas set cumulative limits per time period (“up to 10GB per minute”). - Get the system as hot as you are willing to ever have it in production - Run it like that for a good while - Collect query execution time from system.query_log - Set limits and quotas accordingly - Separate INSERTs from SELECTs 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 16/22
  • 17. Put in per-host limits and quotas User / profile limits set upper boundaries on individual queries. - Use max_execution_time to prevent runaway queries - Be aggressive; analyze the data set and query patterns - Having upper bound to query time makes request rate limiting possible - Use max_rows_to_read to short circuit “impossible” queries 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 17/22
  • 18. Put in per-host limits and quotas Quotas set cumulative limits per time period. Time based quotas (“in 60s you can spend up to 850s processing queries for user A and 150s processing queries for the user B”) do not “care” about the reason the quota was exceeded, which makes them very general in application. - Execution time is the final metric (things are either too slow or fast enough) - Do not care why it is slow (traffic spike; broken disk; TOR switch bad) - Slowest node taken out of rotation first (while byte-based penalizes fastest) We ended up with 2 users: “api” and “inserter”. 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 18/22
  • 19. After going live If limits and quotas have been set correctly, you can “walk away” as users add columns and queries (if you don’t care about end user experience). - The only “hard” limit is running out of disk space - But by that time it is likely way too late - Repeat the capacity planning exercise for each addition (column, query) - But you can do it on single host - Multiple clusters for long retention (q1 cluster, q2 cluster, etc) - Limits and quotas are there to prevent catastrophes only - You need layered access control and rate limiting 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 19/22
  • 20. Things I try to remember Given my RDBMS background, I try to remember, in no particular order: - It is about bytes not records - Trivial change to a SELECT query can make it MILLION TIMES SLOWER - Index construction is paramount - Execution time is a great top-level metric - You can’t delete things - Clickhouse is very well written by really smart people - There is no magic (cough Distributed engine) - Unreasonable to expect end user to understand any of this (“but it is SQL”) 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 20/22
  • 21. Recap 1. Remove all constraints 2. Empirically establish optimal parameters for the given work load a. The big thing is you can do it on a single host 3. Protect these parameters with limits and quotas Devote a million lifetimes to: - Study of primary key construction - Design of infinitely scalable bi-level sharding schemes 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 21/22
  • 22. Thank you! 2019-12-03 SF Clickhouse Meetup Mik Kocikowski mik@cloudflare.com 22/22