SlideShare a Scribd company logo
1 of 30
Download to read offline
1
ClickHouse
at ContentSquare
2
Summary
What are we doing at ContentSquare?
What have we done so far with ClickHouse?
What do we think about ClickHouse?
3
Who are we
Christophe Kalenzaga
Senior Data Engineer
Vianney Foucault
Senior Platform Engineer
4
ContentSquare in a few words
● Web analytic Company
○ 600GB collected per day
○ 13 months retention (~230 TBytes)
○ websites and mobile applications
● 100+ clients Worldwide
○ Automotive | BFSI | Retail | Grocery | Luxury
○ B2B | Energy | Travel | Telco | Gaming
● Collected data doubles every year
5
One of the many challenges of ContentSquare
6
Our workload
● 30k queries per day
● Skewed load
● CPU intensive queries
● 20% of cacheable queries
● Can’t precompute queries
7
current backend infrastructure
● Many ElasticSearch clusters
● Storage cost
● Compute cost
● Can’t handle clients with too much data
Scala Microservices
Data Pipeline Components
(Scala/Spark)
8
to sum up
9
what we’ve done so far
Aug 2018Jun 2018Mar 2018Jan 2018Dec 2017
Selection of a few technologies
to replace ElasticSearch
Start mobile analytics
solution on ClickHouse
One-week POC on ClickHouse on a
specific feature of the web analytics
solution First release of the mobile analytics
solution on clickhouse.
insert data from the web analytics
solution into ClickHouse check if
there are no side effects
Start migrating the web analytic
solution to ClickHouse
Benchmarks to find the
right data model and the
right configuration
the mobile analytics solution is
released
Looking for a new technology
POC of multiple technologies
10
How to benchmark?
11
Benchmark methodology
● Define different types of query
● Get the statistical distribution of each type of query in production
● Create a dataset (10 TBytes)
● Create queries with the same statistical distribution as in production
● Be deterministic
12
Benchmark Constraints for a #Platfomer
Deployment
CPU’s
Memory
Configuration
Management
Deterministic
Automation Cost Optimisation
Monitoring
Data Load
13
Benchmark timeline
{ Time consuming Manual tasks }
Iteration
Setup
infrastructure
Deploy
Middlewares
~ 4 hours ~ 12 hours
Load test Data Benchmark
{ Time consuming }
Cleanup
14
Platformer’s Words
DevOps Motto:
Industrialise As Soon As Possible
15
Fully Automated Benchmark timeline
~ 4 hours ~ 12 hours5 Minutes 1 Minute
Iteration
Setup
infrastructure
Deploy
Middlewares
Load test Data Benchmark Cleanup
DevOps Approved
16
Platformer’s Words
DevOps Leitmotiv:
Monitor As Soon As Possible
17
Benchmark iterations on operating systems
http://openbenchmarking.org/result/1704225-TR-AMAZON38819
18
Benchmark iterations on different operating systems
ResponseTime
Percentiles
+60%
+60%
19
Benchmark iterations on Instances
Instance Type vCPU Mem (GiB) Networking Perf. $ Per Hour Perf Index
i3.large 2 15,25 Up to 10 Gigabit 0.172 1
i3.xlarge 4 30,5 Up to 10 Gigabit 0.344 1
i3.2xlarge 8 61 Up to 10 Gigabit 0.688 1
i3.4xlarge 16 122 Up to 10 Gigabit 1.376 1
i3.8xlarge 32 244 10 Gigabit 2.752 1
i3.16xlarge 64 488 25 Gigabit 5.504 1
i3.metal 72 512 25 Gigabit 5.504 1.3
20
Benchmark iterations on different Amazon servers
All the cluster configurations have the same price!
ResponseTime
Percentiles
Blue To Red +40%
21
Numa Architecture
QPI Bus
ClickHouse Instance 1
MEMORY CPU
I/O Bridge
MEMORYCPU
I/O Bridge
Linux OS
NUMA Node0 NUMA Node1
ClickHouse Instance 2
22
Benchmark iterations on scalability
ResponseTime
Percentiles
+85%
+55%
23
Benchmark iterations on parallelism
ResponseTime
Percentiles
24
New Backend Infrastructure (Zoom in)
1 EC2 (i3.metal) instance hosts 2 clickhouse servers
1
1’
2
2’
3
3’
4
4’
5
5’
6
6’
ClickhouseCluster
AZ-1aAZ-1b
Replication
25
New Backend Infrastructure (Big picture)
ClickHouse Cluster
Data Pipeline
M
assiveInserts
Scala
Microservices
Reads
CHProxy Servers
With local Cache
Monitoring: Prometheus / Grafana
Load Balancers
26
Next Steps
Cluster Extension
Major Disaster
Staging Refresh
Major ClickHouse
upgrade
Full InHouse Automation
27
What do we like with ClickHouse
Native sampling
Stability
Easy to understand
Fast
Active developments
28
What parts of ClickHouse still need to be improved
Difficult to master
Lack of tooling
Random stability of a new version
No query optimizer
29
Do we recommend ClickHouse?
30
Any question?
Thank you
We’re hiring!

More Related Content

What's hot

ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...Altinity Ltd
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...Amazon Web Services
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
 
Monitoring MySQL with Prometheus and Grafana
Monitoring MySQL with Prometheus and GrafanaMonitoring MySQL with Prometheus and Grafana
Monitoring MySQL with Prometheus and GrafanaJulien Pivotto
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteStreamNative
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouserpolat
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpHostedbyConfluent
 
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarScalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarDatabricks
 
Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows
Dok Talks #111 - Scheduled Scaling with Dask and Argo WorkflowsDok Talks #111 - Scheduled Scaling with Dask and Argo Workflows
Dok Talks #111 - Scheduled Scaling with Dask and Argo WorkflowsDoKC
 
Data Analyse Black Horse - ClickHouse
Data Analyse Black Horse - ClickHouseData Analyse Black Horse - ClickHouse
Data Analyse Black Horse - ClickHouseJack Gao
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaArvind Kumar G.S
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseAltinity Ltd
 
Scalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERScalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERShuyi Chen
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioDevOpsDays Tel Aviv
 
Thanos: Global, durable Prometheus monitoring
Thanos: Global, durable Prometheus monitoringThanos: Global, durable Prometheus monitoring
Thanos: Global, durable Prometheus monitoringBartłomiej Płotka
 

What's hot (20)

ClickHouse Intro
ClickHouse IntroClickHouse Intro
ClickHouse Intro
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
gRPC
gRPCgRPC
gRPC
 
Monitoring MySQL with Prometheus and Grafana
Monitoring MySQL with Prometheus and GrafanaMonitoring MySQL with Prometheus and Grafana
Monitoring MySQL with Prometheus and Grafana
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
 
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarScalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
 
Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows
Dok Talks #111 - Scheduled Scaling with Dask and Argo WorkflowsDok Talks #111 - Scheduled Scaling with Dask and Argo Workflows
Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows
 
Data Analyse Black Horse - ClickHouse
Data Analyse Black Horse - ClickHouseData Analyse Black Horse - ClickHouse
Data Analyse Black Horse - ClickHouse
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
Scalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERScalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBER
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
 
Thanos: Global, durable Prometheus monitoring
Thanos: Global, durable Prometheus monitoringThanos: Global, durable Prometheus monitoring
Thanos: Global, durable Prometheus monitoring
 
gRPC Overview
gRPC OverviewgRPC Overview
gRPC Overview
 

Similar to Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing

The Practice of Presto & Alluxio in E-Commerce Big Data Platform
The Practice of Presto & Alluxio in E-Commerce Big Data PlatformThe Practice of Presto & Alluxio in E-Commerce Big Data Platform
The Practice of Presto & Alluxio in E-Commerce Big Data PlatformAlluxio, Inc.
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveWalid Shaari
 
Capacity Planning Infrastructure for Web Applications (Drupal)
Capacity Planning Infrastructure for Web Applications (Drupal)Capacity Planning Infrastructure for Web Applications (Drupal)
Capacity Planning Infrastructure for Web Applications (Drupal)Ricardo Amaro
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&DChallenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&DDomonkos Tikk
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
 
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...Hendrik van Run
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Giridhar Addepalli
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuningYosuke Mizutani
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkKostas Tzoumas
 
Cloud Computing – A CFO Briefing
Cloud Computing – A CFO BriefingCloud Computing – A CFO Briefing
Cloud Computing – A CFO BriefingJoe Nathans
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
Migration of a high-traffic E-commerce website from Legacy Monolith to Micros...
Migration of a high-traffic E-commerce website from Legacy Monolith to Micros...Migration of a high-traffic E-commerce website from Legacy Monolith to Micros...
Migration of a high-traffic E-commerce website from Legacy Monolith to Micros...Pavel Pratyush
 
Mainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live DataMainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live DataDevOps for Enterprise Systems
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalSub Szabolcs Feczak
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovValeriia Maliarenko
 

Similar to Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing (20)

The Practice of Presto & Alluxio in E-Commerce Big Data Platform
The Practice of Presto & Alluxio in E-Commerce Big Data PlatformThe Practice of Presto & Alluxio in E-Commerce Big Data Platform
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
 
Capacity Planning Infrastructure for Web Applications (Drupal)
Capacity Planning Infrastructure for Web Applications (Drupal)Capacity Planning Infrastructure for Web Applications (Drupal)
Capacity Planning Infrastructure for Web Applications (Drupal)
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&DChallenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuning
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to Cloud
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Cloud Computing – A CFO Briefing
Cloud Computing – A CFO BriefingCloud Computing – A CFO Briefing
Cloud Computing – A CFO Briefing
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
 
Migration of a high-traffic E-commerce website from Legacy Monolith to Micros...
Migration of a high-traffic E-commerce website from Legacy Monolith to Micros...Migration of a high-traffic E-commerce website from Legacy Monolith to Micros...
Migration of a high-traffic E-commerce website from Legacy Monolith to Micros...
 
Path to continuous delivery
Path to continuous deliveryPath to continuous delivery
Path to continuous delivery
 
Mainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live DataMainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live Data
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei Radov
 

Recently uploaded

Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactivestartupro
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
full stack practical assignment msc cs.pdf
full stack practical assignment msc cs.pdffull stack practical assignment msc cs.pdf
full stack practical assignment msc cs.pdfHulkTheDevil
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
A PowerPoint Presentation on Vikram Lander pptx
A PowerPoint Presentation on Vikram Lander pptxA PowerPoint Presentation on Vikram Lander pptx
A PowerPoint Presentation on Vikram Lander pptxatharvdev2010
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
The work to make the piecework work: An ethnographic study of food delivery w...
The work to make the piecework work: An ethnographic study of food delivery w...The work to make the piecework work: An ethnographic study of food delivery w...
The work to make the piecework work: An ethnographic study of food delivery w...stockholm university
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfHCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfROWELL MARQUINA
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerAnchore
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfROWELL MARQUINA
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2DianaGray10
 

Recently uploaded (20)

Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactive
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
full stack practical assignment msc cs.pdf
full stack practical assignment msc cs.pdffull stack practical assignment msc cs.pdf
full stack practical assignment msc cs.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
A PowerPoint Presentation on Vikram Lander pptx
A PowerPoint Presentation on Vikram Lander pptxA PowerPoint Presentation on Vikram Lander pptx
A PowerPoint Presentation on Vikram Lander pptx
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
The work to make the piecework work: An ethnographic study of food delivery w...
The work to make the piecework work: An ethnographic study of food delivery w...The work to make the piecework work: An ethnographic study of food delivery w...
The work to make the piecework work: An ethnographic study of food delivery w...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfHCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey Hightower
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdf
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
 

Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing

  • 2. 2 Summary What are we doing at ContentSquare? What have we done so far with ClickHouse? What do we think about ClickHouse?
  • 3. 3 Who are we Christophe Kalenzaga Senior Data Engineer Vianney Foucault Senior Platform Engineer
  • 4. 4 ContentSquare in a few words ● Web analytic Company ○ 600GB collected per day ○ 13 months retention (~230 TBytes) ○ websites and mobile applications ● 100+ clients Worldwide ○ Automotive | BFSI | Retail | Grocery | Luxury ○ B2B | Energy | Travel | Telco | Gaming ● Collected data doubles every year
  • 5. 5 One of the many challenges of ContentSquare
  • 6. 6 Our workload ● 30k queries per day ● Skewed load ● CPU intensive queries ● 20% of cacheable queries ● Can’t precompute queries
  • 7. 7 current backend infrastructure ● Many ElasticSearch clusters ● Storage cost ● Compute cost ● Can’t handle clients with too much data Scala Microservices Data Pipeline Components (Scala/Spark)
  • 9. 9 what we’ve done so far Aug 2018Jun 2018Mar 2018Jan 2018Dec 2017 Selection of a few technologies to replace ElasticSearch Start mobile analytics solution on ClickHouse One-week POC on ClickHouse on a specific feature of the web analytics solution First release of the mobile analytics solution on clickhouse. insert data from the web analytics solution into ClickHouse check if there are no side effects Start migrating the web analytic solution to ClickHouse Benchmarks to find the right data model and the right configuration the mobile analytics solution is released Looking for a new technology POC of multiple technologies
  • 11. 11 Benchmark methodology ● Define different types of query ● Get the statistical distribution of each type of query in production ● Create a dataset (10 TBytes) ● Create queries with the same statistical distribution as in production ● Be deterministic
  • 12. 12 Benchmark Constraints for a #Platfomer Deployment CPU’s Memory Configuration Management Deterministic Automation Cost Optimisation Monitoring Data Load
  • 13. 13 Benchmark timeline { Time consuming Manual tasks } Iteration Setup infrastructure Deploy Middlewares ~ 4 hours ~ 12 hours Load test Data Benchmark { Time consuming } Cleanup
  • 15. 15 Fully Automated Benchmark timeline ~ 4 hours ~ 12 hours5 Minutes 1 Minute Iteration Setup infrastructure Deploy Middlewares Load test Data Benchmark Cleanup DevOps Approved
  • 17. 17 Benchmark iterations on operating systems http://openbenchmarking.org/result/1704225-TR-AMAZON38819
  • 18. 18 Benchmark iterations on different operating systems ResponseTime Percentiles +60% +60%
  • 19. 19 Benchmark iterations on Instances Instance Type vCPU Mem (GiB) Networking Perf. $ Per Hour Perf Index i3.large 2 15,25 Up to 10 Gigabit 0.172 1 i3.xlarge 4 30,5 Up to 10 Gigabit 0.344 1 i3.2xlarge 8 61 Up to 10 Gigabit 0.688 1 i3.4xlarge 16 122 Up to 10 Gigabit 1.376 1 i3.8xlarge 32 244 10 Gigabit 2.752 1 i3.16xlarge 64 488 25 Gigabit 5.504 1 i3.metal 72 512 25 Gigabit 5.504 1.3
  • 20. 20 Benchmark iterations on different Amazon servers All the cluster configurations have the same price! ResponseTime Percentiles Blue To Red +40%
  • 21. 21 Numa Architecture QPI Bus ClickHouse Instance 1 MEMORY CPU I/O Bridge MEMORYCPU I/O Bridge Linux OS NUMA Node0 NUMA Node1 ClickHouse Instance 2
  • 22. 22 Benchmark iterations on scalability ResponseTime Percentiles +85% +55%
  • 23. 23 Benchmark iterations on parallelism ResponseTime Percentiles
  • 24. 24 New Backend Infrastructure (Zoom in) 1 EC2 (i3.metal) instance hosts 2 clickhouse servers 1 1’ 2 2’ 3 3’ 4 4’ 5 5’ 6 6’ ClickhouseCluster AZ-1aAZ-1b Replication
  • 25. 25 New Backend Infrastructure (Big picture) ClickHouse Cluster Data Pipeline M assiveInserts Scala Microservices Reads CHProxy Servers With local Cache Monitoring: Prometheus / Grafana Load Balancers
  • 26. 26 Next Steps Cluster Extension Major Disaster Staging Refresh Major ClickHouse upgrade Full InHouse Automation
  • 27. 27 What do we like with ClickHouse Native sampling Stability Easy to understand Fast Active developments
  • 28. 28 What parts of ClickHouse still need to be improved Difficult to master Lack of tooling Random stability of a new version No query optimizer
  • 29. 29 Do we recommend ClickHouse?