SlideShare a Scribd company logo
1 of 28
Download to read offline
June 7th, 2022 - My first 90 days with ClickHouse - Alkin Tezuysal -
EVP Global Services - ChistaDATA Inc.
Let’s get connected with Alkin first
● Alkin Tezuysal - EVP - Global Services @chistadata
○ Linkedin : https://www.linkedin.com/in/askdba/
○ Twitter: https://twitter.com/ask_dba
● Open Source Database Evangelist
○ Previously PlanetScale, Percona and Pythian as Technical Manager, SRE, DBA (MySQL)
○ Previously Enterprise DBA , Informix, Oracle, DB2 , SQL Server
● Author, Speaker, Mentor, and Coach
@ChistaDATA Inc. 2022
@ask_dba
Also…
Someone who is Born to Sail
@ChistaDATA Inc. 2022
Forced to Work
@ask_dba
Trivia Question ?
@ChistaDATA Inc. 2022
The left and right sides of the boat are referred to as what?
@ask_dba
About MySQL Cookbook 4th Edition
● O’reilly Book previously authored by Paul Dubois 3 editions
● Solutions for Database Developers and Administrators
● More than 950 pages of recipes for specific database challenges
● It took two years of authoring, rewriting, reviewing, editing and learning.
● Co-authored with Sveta Smirnova - MySQL Expert / Author , Percona
@ChistaDATA Inc. 2022
@ask_dba
@ChistaDATA Inc. 2022
@ask_dba
@svetsmirnova
About another book…
By Vijay Anand
● Database Fundamentals overview
● Comparison and examples from different data stores
● Techniques, tips and tricks for ClickHouse
● Great overview and summary for beginners
@ChistaDATA Inc. 2022
@ask_dba
About ClickHouse
● Columnar Storage
● SQL Compatible
● Open Source (Apache 2.0)
● Shared Nothing Architecture
● Parallel Execution
● Rich in Aggregate Functions
● Super fast for Analytics workload
○ Compression and Encoding
@ChistaDATA Inc. 2022
@ask_dba
Other ClickHouse features
● Engine types for analytical workloads
● Materialized Views
● External data connectors
● Data types for compatibility with other sources
@ChistaDATA Inc. 2022
@ask_dba
Columnar Storage
orders
@ChistaDATA Inc. 2022
order_id 1 2 3 4
order_code AB-01 AB-02 AB-02 AB-03
order_amount 2.99 1.99 1.50 2.25
order_category stationary stationary stationary gifts
@ask_dba
SQL Compatible
● Full SQL parser (INSERT)
○ INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def')
● Data format parser
○ SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS
ORDER BY EventDate FORMAT TabSeparated
@ChistaDATA Inc. 2022
@ask_dba
Shared Nothing Architecture
Data distribution refers to splitting the very large dataset into multiple shards
which are stored on different servers. ClickHouse divides the dataset into
shards according to the sharding key. Each shard holds and processes a part
of the data, the query results from multiple shards are then combined together
to give the final result.
@ChistaDATA Inc. 2022
@ask_dba
Zookeeper
Sharding
@ChistaDATA Inc. 2022
@ask_dba
Shard_01 Shard_02 Shard_03 Shard_n
Replication
Data replication refers to keeping a copy of the data on the other server nodes for
ensuring availability in case of server node failure.
This can also improve performance by allowing multiple servers to process the
data queries in parallel.
@ChistaDATA Inc. 2022
@ask_dba
Zookeeper
Replication
@ChistaDATA Inc. 2022
@ask_dba
Shard_01 Shard_02 Shard_03 Shard_n
Replica_01 Replica_02 Replica_03 Shard_n
Replication & Sharding
@ChistaDATA Inc. 2022
@ask_dba
Shard_01
Replica_01
Clickhouse Node 1
Shard_04
Replica_04
Shard_02
Replica_02
Clickhouse Node 2
Shard_03
Replica_03
Shard_05
Replica_05
Clickhouse Node 3
Shard_06
Replica_06
Shard_n
Replica_n
Clickhouse Node n
Shard_n
Replica_n
Zookeeper
Replication & Sharding
@ChistaDATA Inc. 2022
@ask_dba
Parallel Execution
● Large queries are parallelized naturally, taking all the necessary resources available on the current
server.
● Distributed processing on multiple nodes.
@ChistaDATA Inc. 2022
@ask_dba
Rich in Aggregate Functions
● Generic aggregate functions (count, min, max, avg, etc.)
○ Ton of ClickHouse specific aggregate functions
● Parametric aggregate functions (histogram, sequenceMatch, etc.)
● Combinators to change the behavior of the aggregate function (-if sumIf,
avgIf)(-array sumArray, uniqArray)
@ChistaDATA Inc. 2022
@ask_dba
Super fast for Analytics workload
● Cost efficient performance against other solutions
● Improved performance on every release
● Use cases and usage increasing hence
@ChistaDATA Inc. 2022
@ask_dba
Data on Kubernetes?
● Still a controversial subject?
● Community and use cases
● Is it production grade yet?
@ChistaDATA Inc. 2022
@ask_dba
Operators
● Vitess operator by PlanetScale
● Percona XtraDB, MongoDB, PostgreSQL operators by Percona
● ClickHouse Kubernetes Operator by Altinity
● Oracle MySQL InnoDB Cluster
● More to come? Maybe …
● In the meantime other operators
○ RedShift,
@ChistaDATA Inc. 2022
@ask_dba
Altinity Operator for ClickHouse
● Creates ClickHouse clusters defined as custom resources
● Customized storage provisioning (VolumeClaim templates)
● Customized pod templates
● Customized service templates for endpoints
● ClickHouse configuration management
● ClickHouse users management
● ClickHouse cluster scaling including automatic schema propagation
● ClickHouse version upgrades
● Exporting ClickHouse metrics to Prometheus
@ChistaDATA Inc. 2022
@ask_dba
Altinity Operator for ClickHouse
@ChistaDATA Inc. 2022
@ask_dba
Altinity Operator for ClickHouse
@ChistaDATA Inc. 2022
@ask_dba
Altinity Operator for ClickHouse
@ChistaDATA Inc. 2022
@ask_dba
Open Source ClickHouse Community
1. DOK
2. ClickHouse
3. Altinity
4. ChistaDATA
@ChistaDATA Inc. 2022
@ask_dba
About ChistaDATA Inc.
● Founded in 2021 by Shiv Iyer - CEO and Principal
● Has received 3M USD seed investment
● Focusing on ClickHouse infrastructure operations
● Services around dedicated Managed Services, Support and Consulting
● We’re hiring globally DBAs, SREs and DevOps Engineers
@ChistaDATA Inc. 2022
@ask_dba

More Related Content

Similar to Dok Talks #133 - My First 90 days with Clickhouse

Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteCarlos Andrés García
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteVMware Tanzu
 
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...Altinity Ltd
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions Yugabyte
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTORiccardo Zamana
 
Continuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData AnalysisContinuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData AnalysisKai Sasaki
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Jaroslav Gergic
 
Data Warehouse Logical Design using Mysql
Data Warehouse Logical Design using MysqlData Warehouse Logical Design using Mysql
Data Warehouse Logical Design using MysqlHAFIZ Islam
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dan Lynn
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform DATAVERSITY
 
MongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDB
MongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDBMongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDB
MongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDBMongoDB
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
How YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQLHow YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQLYugabyte
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...Márton Kodok
 

Similar to Dok Talks #133 - My First 90 days with Clickhouse (20)

Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Google BigQuery
Google BigQueryGoogle BigQuery
Google BigQuery
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
 
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTO
 
Continuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData AnalysisContinuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData Analysis
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
 
Data Warehouse Logical Design using Mysql
Data Warehouse Logical Design using MysqlData Warehouse Logical Design using Mysql
Data Warehouse Logical Design using Mysql
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform 
 
MongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDB
MongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDBMongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDB
MongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDB
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
How YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQLHow YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQL
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
 

More from DoKC

Distributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDistributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDoKC
 
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsIs It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsDoKC
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryDoKC
 
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...DoKC
 
The State of Stateful on Kubernetes
The State of Stateful on KubernetesThe State of Stateful on Kubernetes
The State of Stateful on KubernetesDoKC
 
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...DoKC
 
Make Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyMake Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyDoKC
 
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...DoKC
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudDoKC
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native DatabaseDoKC
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023DoKC
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentDoKC
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154DoKC
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...DoKC
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151DoKC
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...DoKC
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147DoKC
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...DoKC
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sDoKC
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators DoKC
 

More from DoKC (20)

Distributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDistributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and How
 
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsIs It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
 
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
 
The State of Stateful on Kubernetes
The State of Stateful on KubernetesThe State of Stateful on Kubernetes
The State of Stateful on Kubernetes
 
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
 
Make Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyMake Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-Ready
 
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native Database
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8s
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators
 

Recently uploaded

Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 

Recently uploaded (20)

Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 

Dok Talks #133 - My First 90 days with Clickhouse

  • 1. June 7th, 2022 - My first 90 days with ClickHouse - Alkin Tezuysal - EVP Global Services - ChistaDATA Inc.
  • 2. Let’s get connected with Alkin first ● Alkin Tezuysal - EVP - Global Services @chistadata ○ Linkedin : https://www.linkedin.com/in/askdba/ ○ Twitter: https://twitter.com/ask_dba ● Open Source Database Evangelist ○ Previously PlanetScale, Percona and Pythian as Technical Manager, SRE, DBA (MySQL) ○ Previously Enterprise DBA , Informix, Oracle, DB2 , SQL Server ● Author, Speaker, Mentor, and Coach @ChistaDATA Inc. 2022 @ask_dba
  • 3. Also… Someone who is Born to Sail @ChistaDATA Inc. 2022 Forced to Work @ask_dba
  • 4. Trivia Question ? @ChistaDATA Inc. 2022 The left and right sides of the boat are referred to as what? @ask_dba
  • 5. About MySQL Cookbook 4th Edition ● O’reilly Book previously authored by Paul Dubois 3 editions ● Solutions for Database Developers and Administrators ● More than 950 pages of recipes for specific database challenges ● It took two years of authoring, rewriting, reviewing, editing and learning. ● Co-authored with Sveta Smirnova - MySQL Expert / Author , Percona @ChistaDATA Inc. 2022 @ask_dba
  • 7. About another book… By Vijay Anand ● Database Fundamentals overview ● Comparison and examples from different data stores ● Techniques, tips and tricks for ClickHouse ● Great overview and summary for beginners @ChistaDATA Inc. 2022 @ask_dba
  • 8. About ClickHouse ● Columnar Storage ● SQL Compatible ● Open Source (Apache 2.0) ● Shared Nothing Architecture ● Parallel Execution ● Rich in Aggregate Functions ● Super fast for Analytics workload ○ Compression and Encoding @ChistaDATA Inc. 2022 @ask_dba
  • 9. Other ClickHouse features ● Engine types for analytical workloads ● Materialized Views ● External data connectors ● Data types for compatibility with other sources @ChistaDATA Inc. 2022 @ask_dba
  • 10. Columnar Storage orders @ChistaDATA Inc. 2022 order_id 1 2 3 4 order_code AB-01 AB-02 AB-02 AB-03 order_amount 2.99 1.99 1.50 2.25 order_category stationary stationary stationary gifts @ask_dba
  • 11. SQL Compatible ● Full SQL parser (INSERT) ○ INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def') ● Data format parser ○ SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS ORDER BY EventDate FORMAT TabSeparated @ChistaDATA Inc. 2022 @ask_dba
  • 12. Shared Nothing Architecture Data distribution refers to splitting the very large dataset into multiple shards which are stored on different servers. ClickHouse divides the dataset into shards according to the sharding key. Each shard holds and processes a part of the data, the query results from multiple shards are then combined together to give the final result. @ChistaDATA Inc. 2022 @ask_dba
  • 14. Replication Data replication refers to keeping a copy of the data on the other server nodes for ensuring availability in case of server node failure. This can also improve performance by allowing multiple servers to process the data queries in parallel. @ChistaDATA Inc. 2022 @ask_dba
  • 15. Zookeeper Replication @ChistaDATA Inc. 2022 @ask_dba Shard_01 Shard_02 Shard_03 Shard_n Replica_01 Replica_02 Replica_03 Shard_n
  • 16. Replication & Sharding @ChistaDATA Inc. 2022 @ask_dba Shard_01 Replica_01 Clickhouse Node 1 Shard_04 Replica_04 Shard_02 Replica_02 Clickhouse Node 2 Shard_03 Replica_03 Shard_05 Replica_05 Clickhouse Node 3 Shard_06 Replica_06 Shard_n Replica_n Clickhouse Node n Shard_n Replica_n Zookeeper
  • 18. Parallel Execution ● Large queries are parallelized naturally, taking all the necessary resources available on the current server. ● Distributed processing on multiple nodes. @ChistaDATA Inc. 2022 @ask_dba
  • 19. Rich in Aggregate Functions ● Generic aggregate functions (count, min, max, avg, etc.) ○ Ton of ClickHouse specific aggregate functions ● Parametric aggregate functions (histogram, sequenceMatch, etc.) ● Combinators to change the behavior of the aggregate function (-if sumIf, avgIf)(-array sumArray, uniqArray) @ChistaDATA Inc. 2022 @ask_dba
  • 20. Super fast for Analytics workload ● Cost efficient performance against other solutions ● Improved performance on every release ● Use cases and usage increasing hence @ChistaDATA Inc. 2022 @ask_dba
  • 21. Data on Kubernetes? ● Still a controversial subject? ● Community and use cases ● Is it production grade yet? @ChistaDATA Inc. 2022 @ask_dba
  • 22. Operators ● Vitess operator by PlanetScale ● Percona XtraDB, MongoDB, PostgreSQL operators by Percona ● ClickHouse Kubernetes Operator by Altinity ● Oracle MySQL InnoDB Cluster ● More to come? Maybe … ● In the meantime other operators ○ RedShift, @ChistaDATA Inc. 2022 @ask_dba
  • 23. Altinity Operator for ClickHouse ● Creates ClickHouse clusters defined as custom resources ● Customized storage provisioning (VolumeClaim templates) ● Customized pod templates ● Customized service templates for endpoints ● ClickHouse configuration management ● ClickHouse users management ● ClickHouse cluster scaling including automatic schema propagation ● ClickHouse version upgrades ● Exporting ClickHouse metrics to Prometheus @ChistaDATA Inc. 2022 @ask_dba
  • 24. Altinity Operator for ClickHouse @ChistaDATA Inc. 2022 @ask_dba
  • 25. Altinity Operator for ClickHouse @ChistaDATA Inc. 2022 @ask_dba
  • 26. Altinity Operator for ClickHouse @ChistaDATA Inc. 2022 @ask_dba
  • 27. Open Source ClickHouse Community 1. DOK 2. ClickHouse 3. Altinity 4. ChistaDATA @ChistaDATA Inc. 2022 @ask_dba
  • 28. About ChistaDATA Inc. ● Founded in 2021 by Shiv Iyer - CEO and Principal ● Has received 3M USD seed investment ● Focusing on ClickHouse infrastructure operations ● Services around dedicated Managed Services, Support and Consulting ● We’re hiring globally DBAs, SREs and DevOps Engineers @ChistaDATA Inc. 2022 @ask_dba