Monitoring with Clickhouse

Monitoring with
Clickhouse
Berlin DevOps 2018-09-26
Ilya @GoEuro

GoEuro Scale:
● 20 mio+ visitors / month
● 150+ Engineers
● 300+ microservices in production
● 600+ releases per week

Monitoring in GoEuro
● Push-based
● Graphite + Grafana
● 30MBps ingress traffic
● 8 Mio data points per minute
● Tags
● Hostname as a part of each metric

Common Graphite infrastructure

Evolution of our Graphite Setup
1. You start with a common Graphite Stack:
Default components, one mirror (2 replicas), no sharding
2. First performance issues:
Bigger VMs, SSD, memcached, carbon-c-relay, no sharding
* go-carbon - that could have won us some time - it’s way faster than
carbon-cache
3. Bigger performance issues:
Multiple instances, jump hash for sharding, carbonate to rebalance the
cluster, custom cleanup jobs, ﬁlling gaps of replication, have to deal
with coupled read and writes

We are building a distributed
database, aren’t we?

Let’s look around in 2018
Criteria for a new backend:
● Replication
● Sharding
● Scaling out
● Aggregation/retention engine
● Graphite compatible for both reads and writes
● Price
● Complexity
● Monitoring
● Robustness e.g. data lost

Graphite backends evaluated
● ElasticSearch - too much effort to make it scale
● Kairos DB - no mechanism of retention out of the box; no Graphite reader
● BigGraphite - too slow; Cassandra has a pretty steep learning curve
● Prometheus - doesn't scale out of the box; we'll have to switch whole
company from Pushing metrics to Pulling them
● GlusterFS - 8x slower on writes vs same storage attached locally, requires
lot of tunings
● Ceph - also too slow
● OpenTSDB - uses HDFS as a filesystem, which makes it from the beginning
a super complex choice
● InfluxDB - you need to come up with an external search index
● Clickhouse - our winner

What is Clickhouse
ClickHouse is an open source column-oriented database
management system capable of real-time generation of
analytical data reports using SQL queries.
https://clickhouse.yandex/

What is Clickhouse
● Blazing Fast
● Linearly Scalable
● Hardware Eﬃcient
● Fault Tolerant
● Sharding and replication out of the box
● Custom table engines (including GraphiteMergeTree)

Clickhouse as a Graphite backend
● Ecosystem is there
● 100% coverage of the Graphite query
language
● We had a seamless experience with golang
implementation (lomik)

Downsides
● Dependent on Zookeeper for sharding and
replication (we don’t use it now)
● Sharding requires some attention
● Read queries against shards are slower
● Well known in Russian-speaking world but
not outside

Current performance
● Uses 2 cores and 2GB of RAM on our scale
● Graphite-web response times before and after:

Monitoring with Clickhouse

More Related Content

What's hot

Similar to Monitoring with Clickhouse

Recently uploaded

Monitoring with Clickhouse