Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

2

Share

ClickHouse Introduction by Alexander Zaitsev, Altinity CTO

Presented at ClickHouse Meetup in Madrid,
April 2, 2019

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

ClickHouse Introduction by Alexander Zaitsev, Altinity CTO

  1. 1. 18:15 Opening word (Javier Santana) 18:25 ClickHouse introduction (Alexander Zaitsev, Altinity) 19:00 ClickHouse 2019 new features (Alexey Milovidov, Yandex) 19:40 Coffee break 20:00 From legacy to ClickHouse (Iago Enriquez, Idealista) 20:25 1027 predictive models in 10 seconds (David Pardo Villaverde, Corunet) 21:55 Shipping Data from Postgres to Clickhouse (Murat Kabilov, Adjust) 21:20 More Q&A and closing remarks 21:30 Networking, beer etc.
  2. 2. What Is ClickHouse?
  3. 3. © http://mattturck.com/
  4. 4. ClickHouse DBMS is Column Store MPP Realtime SQL Open Source
  5. 5. http://clickhouse.yandex • Developed by Yandex for Yandex.Metrica • Yandex (NASDAQ: YNDX) – “Russian Google” (50% market share in search, 50+ b2b and b2c products) • Yandex.Metrica – world 2nd largest web analytics platform • Open Source since June 2016 (Apache 2.0 license) • 200+ companies using in production today • Several hundred experimenting, doing POC etc. • Dozens of contributors to the source code
  6. 6. Why Yet Another DBMS?
  7. 7. SQLFlexible
  8. 8. ClickHouse •Fast! •Flexible! •Free! •Fun!
  9. 9. How Fast?
  10. 10. :) select count(*) from dw.ad8_fact_event; SELECT count(*) FROM dw.ad8_fact_event ┌───────count()─┐ │ 1261705085657 │ └───────────────┘ 1 rows in set. Elapsed: 3.552 sec. Processed 1.26 trillion rows, 1.26 TB (355.22 billion rows/s., 355.22 GB/s.) Altinity Ltd. www.altinity.com 1+ trillion rows table
  11. 11. :) select sum(price_cpm) from dw.ad8_fact_event where access_day=today()-1 and event_key=-2; SELECT sum(price_cpm) FROM dw.ad8_fact_event WHERE (access_day = (today() - 1)) AND (event_key = -2) ┌────sum(price_cpm)─┐ │ 87579.09035192338 │ └───────────────────┘ 1 rows in set. Elapsed: 0.168 sec. Processed 161.89 million rows, 2.91 GB (961.83 million rows/s., 17.31 GB/s.) Altinity Ltd. www.altinity.com 1+ trillion rows table
  12. 12. WikiStat data, 28B rows. https://www.percona.com/blog/2017/03/17/column-store-database-benchmarks-mariadb-columnstore-vs-clickhouse-vs-apache-spark/
  13. 13. Query 1 Query 2 Query 3 Query 4 Setup 0.009 0.027 0.287 0.428 BrytlytDB 2.0 & 2-node p2.16xlarge cluster 0.034 0.061 0.178 0.498 MapD & 2-node p2.8xlarge cluster 0.051 0.146 0.047 0.794 kdb+/q & 4 Intel Xeon Phi 7210 CPUs 0.241 0.826 1.209 1.781 ClickHouse, 3 x c5d.9xlarge cluster 0.762 2.472 4.131 6.041 BrytlytDB 1.0 & 2-node p2.16xlarge cluster 1.034 3.058 5.354 12.748 ClickHouse, Intel Core i5 4670K 1.56 1.25 2.25 2.97 Redshift, 6-node ds2.8xlarge cluster 2 2 1 3 BigQuery 2.362 3.559 4.019 20.412 Spark 2.4 & 21 x m3.xlarge HDFS cluster 6.41 6.19 6.09 6.63 Amazon Athena 8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS 35 39 64 81 Presto, 5-node m3.xlarge cluster w/ HDFS 152 175 235 368 PostgreSQL 9.5 & cstore_fdw “1.1 Billion Taxi Rides Benchmarks” http://tech.marksblogg.com/benchmarks.html “This is the first time a free CPU-based database has managed to out-perform a GPU-based database in my benchmarks.” Mark Litwintschik
  14. 14. Time Series Benchmarks • https://github.com/timescale/tsbs • Benchmark suite to automate testing • Loads 103M rows, 10 metrics per row • Runs 15 queries, 1000 runs each in 8 parallel threads • Supports TimescaleDB, InfluxDB, Cassandra, MongoDB and ClickHouse (Altinity PR is submitted)
  15. 15. 0 100 200 300 400 500 600 700 800 900 ClickHouse TimescaleDB InfluxDB Load time (s)
  16. 16. 1.20 26 0.46 0.00 5.00 10.00 15.00 20.00 25.00 30.00 ClickHouse TimescaleDB InfluxDB Data Size on disk (GB)
  17. 17. 0 10 20 30 40 50 60 70 80 “Light” queries, time in ms ClickHouse TimescaleDB InfluxDB
  18. 18. 0 10 20 30 40 50 60 70 80 90 “Heavy” queries, time in sec ClickHouse TimescaleDB InfluxDB
  19. 19. How Flexible?
  20. 20. ClickHouse runs at • Bare metal (any Linux) • Public clouds: Amazon, Azure, Google, Alibaba • Private clouds • Docker, Kubernetes
  21. 21. ClickHouse solves business problems at: • Mobile App and Web analytics • AdTech • Retail and E-Commerce • Operational Logs analytics • Telecom/Monitoring • Financial Markets analytics • Security Audit • BlockChain transactions analysis
  22. 22. ClickHouse Migrations
  23. 23. Size does not matter Yandex: 500+ servers, 25B rec/day LifeStreet: 60 servers, 100B rec/day CloudFlare: 76 servers, 200B rec/day Bloomberg: 102 servers, 1000B rec/day Toutiao: 1000 servers
  24. 24. How fun ☺ life←{↑1 ω∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂ω}
  25. 25. with (select groupArray(C) from C) as Ca select id, groupArray(S) Sa, groupArray(V) Va, groupArray(D) Da, groupArray(P) Pa, arrayMap(c -> arrayFirstIndex(s -> s > c, Sa)-1, Ca) Ka, arrayMap((c,k) -> Va[k] + (Va[k+1] - Va[k])/(Sa[k+1] - Sa[k])*(c- Sa[k]),Ca,Ka) Ta, arrayMap(s -> arrayFirstIndex(c -> c>s, Ca)>0 ? arrayFirstIndex(c -> c>s, Ca)-1 : toInt32(length(Ca)), Sa) Ja, arrayMap(i -> Ta[i], Ja) Ra, arrayMap((v,r) -> v - r, Va, Ra) ARa, arraySum((x,y,z) -> x*y*z, ARa, Da, Pa) result from T group by id
  26. 26. What’s new in 2019 … Alexey Milovidov will disclose latest coolest features in 15 minutes
  27. 27. More user friendly than ever! • GDPR compliance – thanks to UPDATE/DELETE • Easier BI integration – thanks to SQL compatibility changes and improvements in ODBC driver • Easier cluster operation – thanks to clickhouse-copier, distributed DDL and upcoming Altinity ClickHouse operator for Kubernetes • Easier integration with other systems. Thanks to: • HTTP/TCP protocols • Table functions to access external data • Kafka storage engine • Logs integration with Logstash, ClickTail and other tools • Integration wth MySQL, PostgreSQL
  28. 28. Integrates with MySQL • mysql() table function • MySQL table engine • MySQL external dictionaries • ProxySQL • Binary log replication with clickhouse-mysql
  29. 29. mysql() table function select * from mysql('host:port', database, 'table', 'user', 'password'); https://www.altinity.com/blog/2018/2/12/aggregate-mysql-data-at-high-speed-with-clickhouse • Easiest and fastest way to get data from MySQL • Load to CH table and run queries much faster
  30. 30. MySQL table engine CREATE TABLE … Engine = MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause']); • SELECTs and INSERTs! • No caching, data is queried from th remote server https://clickhouse.yandex/docs/en/operations/table_engines/mysql/
  31. 31. MySQL external dictionaries • Makes data from mysql database accessible in ClickHouse queries • Stores in memory • Updates when the source data changes SELECT dictGet(‘dim_geo’, ‘country_name’, geo_key) country_name,sum(imps) FROM T GROUP BY country_name;
  32. 32. Binary log replication from MySQL to ClickHouse MySQL clickhouse-mysql Queries Source Data See details at: https://www.altinity.com/blog/2018/6/30/realtime-mysql-clickhouse-replication-in-practice
  33. 33. Integrates with PostgreSQL • odbc() table function • ODBC engine • ODBC dictionaries • PostgreSQL foreign data wrapper – clickhouse_fwd by Percona • pg2ch -- binary log replication, Murat Kibilov will present later today
  34. 34. hello-kubernetes.yaml: apiVersion: "clickhouse.altinity.com/v1" kind: "ClickHouseInstallation" metadata: name: "hello-kubernetes" spec: configuration: clusters: - name: "sharded" layout: type: Standard shardsCount: 3
  35. 35. $ kubectl -n test apply -f docs/examples/hello-kubernetes.yaml clickhouseinstallation.clickhouse.altinity.com/hello-kubernetes created $ kubectl -n test get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE chi-a9cffb-347e-0-0 ClusterIP None <none> 8123/TCP,9000/TCP,9009/TCP 7s chi-a9cffb-347e-1-0 ClusterIP None <none> 8123/TCP,9000/TCP,9009/TCP 7s chi-a9cffb-347e-2-0 ClusterIP None <none> 8123/TCP,9000/TCP,9009/TCP 7s clickhouse-example-02 LoadBalancer 10.98.156.78 <pending> 8123:30703/TCP,9000:30348/TCP 7s
  36. 36. $ docker run -it yandex/clickhouse-client clickhouse-client -h 10.98.156.78 ClickHouse client version 19.1.14. Connecting to 10.98.156.78:9000. Connected to ClickHouse server version 19.1.14 revision 54413. chi-a9cffb-347e-1-0-0.chi-a9cffb-347e-1-0.test.svc.cluster.local :)
  37. 37. chi-a9cffb-347e-1-0-0.chi-a9cffb-347e-1-0.test.svc.cluster.local :) create table test_distr as system.one Engine = Distributed('sharded', system, one); CREATE TABLE test_distr AS system.one ENGINE = Distributed('sharded', system, one) Ok. 0 rows in set. Elapsed: 0.016 sec.
  38. 38. chi-a9cffb-347e-1-0-0.chi-a9cffb-347e-1-0.test.svc.cluster.local :) select * from test_distr; SELECT * FROM test_distr ┌─dummy─┐ │ 0 │ └───────┘ ┌─dummy─┐ │ 0 │ └───────┘ ┌─dummy─┐ │ 0 │ └───────┘ 3 rows in set. Elapsed: 0.054 sec.
  39. 39. …coming soon • Managing persistent volumes to be used for ClickHouse data • Configuring pod deployment (pod templates, affinity rules and so on) • Creating replicated tables • Managing users/profiles configuration • Exporting ClickHouse metrics to Prometheus • Handling ClickHouse version upgrades … and more
  40. 40. ClickHouse Today • Mature Analytic DBMS. Proven by many companies • Almost 3+ years in Open Source • Constantly improves • Solid community • Emerging eco-system • Support from Altinity
  41. 41. Q&A Contact me: alz@altinity.com skype: alex.zaitsev telegram: @alexanderzaitsev Altinity
  • PloyPandy

    Jul. 5, 2021
  • fujohnwang

    Apr. 13, 2019

Presented at ClickHouse Meetup in Madrid, April 2, 2019

Views

Total views

1,050

On Slideshare

0

From embeds

0

Number of embeds

164

Actions

Downloads

0

Shares

0

Comments

0

Likes

2

×