SlideShare a Scribd company logo
© 2022 Altinity, Inc.
ClickHouse -If
Combinators for Fun and
Profit
Robert Hodges, Altinity
SF Bay Area ClickHouse Meetup
May 4, 2022
1
© 2022 Altinity, Inc.
Let’s make some introductions
ClickHouse support and services including Altinity.Cloud
Authors of Altinity Kubernetes Operator for ClickHouse
and other open source projects
Robert Hodges
Database geek with 30+ years
on DBMS systems. Day job:
Altinity CEO
Altinity Engineering
Database geeks with centuries
of experience in DBMS and
applications
2
© 2022 Altinity, Inc.
What are -If Combinators?
ClickHouse Aggregation Functions
● count()
● min(value)
● max(value)
● sum(value)
● avg(value)
● uniq(value)
● uniqExact(value)
● …
3
-If versions of same functions
● countIf(condition)
● minIf(value, condition)
● maxIf(value, condition)
● sumIf(value, condition)
● avgIf(value, condition)
● uniqIf(value, condition)
● uniqExactIf(value, condition)
● …
© 2022 Altinity, Inc.
Finding dirty data using a histogram
– NYC Taxi data: fare amounts
WITH histogram(5)(fare_amount) AS hist
SELECT
arrayJoin(hist) AS hist1,
bar(hist1.3, 0, 1000, 20) AS bar
FROM tripdata
┌─hist1──────────────────────────────────────────────────┬─bar──────────────────┐
│ (-21474808,-12079576.375,1) │ │
│ (-12079576.375,-1342166.6614073187,163862993.25) │ ████████████████████ │
│ (-1342166.6614073187,177236.82575785994,983177956.125) │ ████████████████████ │
│ (177236.82575785994,502005.5770089285,163863003.875) │ ████████████████████ │
│ (502005.5770089285,825998.625,8.75) │ ▏ │
└────────────────────────────────────────────────────────┴──────────────────────┘
5 rows in set. Elapsed: 6.853 sec. Processed 1.31 billion rows, 5.24 GB
(191.28 million rows/s., 765.11 MB/s.)
4
© 2022 Altinity, Inc.
Finding dirty data with countIf()
– NYC Taxi data: fare amounts
SELECT
countIf(fare_amount < 0) AS fare_less_than_zero,
countIf((fare_amount >= 0) AND (fare_amount < 100)) AS fare_0_to_100,
countIf((fare_amount >= 100) AND (fare_amount < 1000)) AS fare_100_to_1000,
countIf(fare_amount > 1000) AS fare_1000_or_greater
FROM tripdata
┌─fare_less_than_zero─┬─fare_0_to_100─┬─fare_100_to_1000─┬─fare_1000_or_greater─┐
│ 128932 │ 1310237048 │ 537609 │ 373 │
└─────────────────────┴───────────────┴──────────────────┴──────────────────────┘
1 rows in set. Elapsed: 0.898 sec. Processed 1.31 billion rows, 5.24 GB
(1.46 billion rows/s., 5.84 GB/s.)
5
Simpler output and faster too!
© 2022 Altinity, Inc.
Searching for GitHub event counts using GROUP BY
-- Github event data by year.
SELECT
toYear(created_at) AS year, event_type, count()
FROM github_events
WHERE event_type IN ('PullRequestEvent', 'WatchEvent')
GROUP BY year, event_type
ORDER BY year ASC, event_type ASC
┌─year─┬─event_type───────┬──count()─┐
│ 2011 │ PullRequestEvent │ 476818 │
│ 2011 │ WatchEvent │ 1831742 │
│ 2012 │ PullRequestEvent │ 1807044 │
│ 2012 │ WatchEvent │ 4048676 │
6
© 2022 Altinity, Inc.
Pivoting event data using countIf()
-- Github event data by year.
SELECT
toYear(created_at) AS year,
countIf(event_type = 'PullRequestEvent') AS PRs,
countIf(event_type = 'WatchEvent') AS Stars
FROM github_events
WHERE event_type IN ('PullRequestEvent', 'WatchEvent')
GROUP BY year
┌─year─┬──────PRs─┬────Stars─┐
│ 2011 │ 476818 │ 1831742 │
│ 2012 │ 1807044 │ 4048676 │
│ 2013 │ 3103759 │ 7432800 │
│ 2014 │ 5923843 │ 11952935 │
7
Way slower if you
omit the IN clause
© 2022 Altinity, Inc.
Cleanup up multiple inputs on the fly with avgIf()
SELECT
count() AS rides,
avgIf(passenger_count, passenger_count < 10)
AS passengers_clean,
avgIf(fare_amount, (fare_amount >= 0) AND (fare_amount < 500))
AS fares_clean
FROM tripdata
┌──────rides─┬──passengers_clean─┬────────fares_clean─┐
│ 1310903963 │ 1.681498872092554 │ 11.424149328643022 │
└────────────┴───────────────────┴────────────────────┘
1 rows in set. Elapsed: 1.092 sec. Processed 1.31 billion rows, 6.55 GB
(1.20 billion rows/s., 6.00 GB/s.)
8
Multiple conditions
in a single scan!
© 2022 Altinity, Inc.
Finding out more tricks for using -If combinators
9
● ClickHouse docs - Very limited
● Altinity Knowledge Base - Examples using system tables and clever tricks
○ Example: Simple aggregate functions & combinators
● Tests in ClickHouse codebase – Examples of what’s possible but not why
cd ~/git/ClickHouse/src/tests
grep -r countIf .
grep -r avgIf .
grep -r weightedSumIf .
© 2022 Altinity, Inc.
© 2022 Altinity, Inc.
Thank you!
Questions?
https://altinity.com
rhodges at altinity.com
10
Altinity.Cloud
Software and support for
ClickHouse
We’re hiring!

More Related Content

Similar to ClickHouse -If Combinators for Fun and Profit-2022-05-04.pdf

PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
predictionio
 
Criteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) MeetupCriteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) Meetup
Ibrahim Abubakari
 
How to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
How to build an ETL pipeline with Apache Beam on Google Cloud DataflowHow to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
How to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
Lucas Arruda
 
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...
tdc-globalcode
 
Full Stack Visualization: Build A React App With A Sankey Diagram
Full Stack Visualization: Build A React App With A Sankey DiagramFull Stack Visualization: Build A React App With A Sankey Diagram
Full Stack Visualization: Build A React App With A Sankey Diagram
Neo4j
 
Operational Intelligence with WSO2 BAM
Operational Intelligence with WSO2 BAM Operational Intelligence with WSO2 BAM
Operational Intelligence with WSO2 BAM WSO2
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
Ido Green
 
Android Jetpack + Coroutines: To infinity and beyond
Android Jetpack + Coroutines: To infinity and beyondAndroid Jetpack + Coroutines: To infinity and beyond
Android Jetpack + Coroutines: To infinity and beyond
Ramon Ribeiro Rabello
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
Johann Schleier-Smith
 
Data Observability Best Pracices
Data Observability Best PracicesData Observability Best Pracices
Data Observability Best Pracices
Andy Petrella
 
GraphQL Search
GraphQL SearchGraphQL Search
GraphQL Search
Artem Shtatnov
 
Hw09 Analytics And Reporting
Hw09   Analytics And ReportingHw09   Analytics And Reporting
Hw09 Analytics And ReportingCloudera, Inc.
 
5/ GitHub Inner Source @ OPEN'16
5/ GitHub Inner Source @ OPEN'165/ GitHub Inner Source @ OPEN'16
5/ GitHub Inner Source @ OPEN'16
Kangaroot
 
Quick and Dirty GUI Applications using GUIDeFATE
Quick and Dirty GUI Applications using GUIDeFATEQuick and Dirty GUI Applications using GUIDeFATE
Quick and Dirty GUI Applications using GUIDeFATE
Connie New
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch Warmup
Márton Kodok
 
Big data analytics using a custom SQL engine
Big data analytics using a custom SQL engineBig data analytics using a custom SQL engine
Big data analytics using a custom SQL engine
Andrew Tsvelodub
 
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
MongoDB
 
Final gatsby + wagtail - Inclusive product week
Final gatsby + wagtail - Inclusive product weekFinal gatsby + wagtail - Inclusive product week
Final gatsby + wagtail - Inclusive product week
Dawn Wages
 
Measuring Software development with GrimoireLab
Measuring Software development with GrimoireLabMeasuring Software development with GrimoireLab
Measuring Software development with GrimoireLab
Valerio Cosentino
 
Gerrit Analytics applied to Android source code
Gerrit Analytics applied to Android source codeGerrit Analytics applied to Android source code
Gerrit Analytics applied to Android source code
Luca Milanesio
 

Similar to ClickHouse -If Combinators for Fun and Profit-2022-05-04.pdf (20)

PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
 
Criteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) MeetupCriteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) Meetup
 
How to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
How to build an ETL pipeline with Apache Beam on Google Cloud DataflowHow to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
How to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
 
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha BigData How we figured out we had a SRE team at ...
 
Full Stack Visualization: Build A React App With A Sankey Diagram
Full Stack Visualization: Build A React App With A Sankey DiagramFull Stack Visualization: Build A React App With A Sankey Diagram
Full Stack Visualization: Build A React App With A Sankey Diagram
 
Operational Intelligence with WSO2 BAM
Operational Intelligence with WSO2 BAM Operational Intelligence with WSO2 BAM
Operational Intelligence with WSO2 BAM
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
 
Android Jetpack + Coroutines: To infinity and beyond
Android Jetpack + Coroutines: To infinity and beyondAndroid Jetpack + Coroutines: To infinity and beyond
Android Jetpack + Coroutines: To infinity and beyond
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
 
Data Observability Best Pracices
Data Observability Best PracicesData Observability Best Pracices
Data Observability Best Pracices
 
GraphQL Search
GraphQL SearchGraphQL Search
GraphQL Search
 
Hw09 Analytics And Reporting
Hw09   Analytics And ReportingHw09   Analytics And Reporting
Hw09 Analytics And Reporting
 
5/ GitHub Inner Source @ OPEN'16
5/ GitHub Inner Source @ OPEN'165/ GitHub Inner Source @ OPEN'16
5/ GitHub Inner Source @ OPEN'16
 
Quick and Dirty GUI Applications using GUIDeFATE
Quick and Dirty GUI Applications using GUIDeFATEQuick and Dirty GUI Applications using GUIDeFATE
Quick and Dirty GUI Applications using GUIDeFATE
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch Warmup
 
Big data analytics using a custom SQL engine
Big data analytics using a custom SQL engineBig data analytics using a custom SQL engine
Big data analytics using a custom SQL engine
 
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
 
Final gatsby + wagtail - Inclusive product week
Final gatsby + wagtail - Inclusive product weekFinal gatsby + wagtail - Inclusive product week
Final gatsby + wagtail - Inclusive product week
 
Measuring Software development with GrimoireLab
Measuring Software development with GrimoireLabMeasuring Software development with GrimoireLab
Measuring Software development with GrimoireLab
 
Gerrit Analytics applied to Android source code
Gerrit Analytics applied to Android source codeGerrit Analytics applied to Android source code
Gerrit Analytics applied to Android source code
 

More from Altinity Ltd

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Altinity Ltd
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Altinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Altinity Ltd
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
Altinity Ltd
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Altinity Ltd
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Altinity Ltd
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Altinity Ltd
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Altinity Ltd
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
Altinity Ltd
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
Altinity Ltd
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Ltd
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Ltd
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
Altinity Ltd
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
Altinity Ltd
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
Altinity Ltd
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
Altinity Ltd
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
Altinity Ltd
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
Altinity Ltd
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
Altinity Ltd
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
Altinity Ltd
 

More from Altinity Ltd (20)

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
 

Recently uploaded

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 

Recently uploaded (20)

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 

ClickHouse -If Combinators for Fun and Profit-2022-05-04.pdf

  • 1. © 2022 Altinity, Inc. ClickHouse -If Combinators for Fun and Profit Robert Hodges, Altinity SF Bay Area ClickHouse Meetup May 4, 2022 1
  • 2. © 2022 Altinity, Inc. Let’s make some introductions ClickHouse support and services including Altinity.Cloud Authors of Altinity Kubernetes Operator for ClickHouse and other open source projects Robert Hodges Database geek with 30+ years on DBMS systems. Day job: Altinity CEO Altinity Engineering Database geeks with centuries of experience in DBMS and applications 2
  • 3. © 2022 Altinity, Inc. What are -If Combinators? ClickHouse Aggregation Functions ● count() ● min(value) ● max(value) ● sum(value) ● avg(value) ● uniq(value) ● uniqExact(value) ● … 3 -If versions of same functions ● countIf(condition) ● minIf(value, condition) ● maxIf(value, condition) ● sumIf(value, condition) ● avgIf(value, condition) ● uniqIf(value, condition) ● uniqExactIf(value, condition) ● …
  • 4. © 2022 Altinity, Inc. Finding dirty data using a histogram – NYC Taxi data: fare amounts WITH histogram(5)(fare_amount) AS hist SELECT arrayJoin(hist) AS hist1, bar(hist1.3, 0, 1000, 20) AS bar FROM tripdata ┌─hist1──────────────────────────────────────────────────┬─bar──────────────────┐ │ (-21474808,-12079576.375,1) │ │ │ (-12079576.375,-1342166.6614073187,163862993.25) │ ████████████████████ │ │ (-1342166.6614073187,177236.82575785994,983177956.125) │ ████████████████████ │ │ (177236.82575785994,502005.5770089285,163863003.875) │ ████████████████████ │ │ (502005.5770089285,825998.625,8.75) │ ▏ │ └────────────────────────────────────────────────────────┴──────────────────────┘ 5 rows in set. Elapsed: 6.853 sec. Processed 1.31 billion rows, 5.24 GB (191.28 million rows/s., 765.11 MB/s.) 4
  • 5. © 2022 Altinity, Inc. Finding dirty data with countIf() – NYC Taxi data: fare amounts SELECT countIf(fare_amount < 0) AS fare_less_than_zero, countIf((fare_amount >= 0) AND (fare_amount < 100)) AS fare_0_to_100, countIf((fare_amount >= 100) AND (fare_amount < 1000)) AS fare_100_to_1000, countIf(fare_amount > 1000) AS fare_1000_or_greater FROM tripdata ┌─fare_less_than_zero─┬─fare_0_to_100─┬─fare_100_to_1000─┬─fare_1000_or_greater─┐ │ 128932 │ 1310237048 │ 537609 │ 373 │ └─────────────────────┴───────────────┴──────────────────┴──────────────────────┘ 1 rows in set. Elapsed: 0.898 sec. Processed 1.31 billion rows, 5.24 GB (1.46 billion rows/s., 5.84 GB/s.) 5 Simpler output and faster too!
  • 6. © 2022 Altinity, Inc. Searching for GitHub event counts using GROUP BY -- Github event data by year. SELECT toYear(created_at) AS year, event_type, count() FROM github_events WHERE event_type IN ('PullRequestEvent', 'WatchEvent') GROUP BY year, event_type ORDER BY year ASC, event_type ASC ┌─year─┬─event_type───────┬──count()─┐ │ 2011 │ PullRequestEvent │ 476818 │ │ 2011 │ WatchEvent │ 1831742 │ │ 2012 │ PullRequestEvent │ 1807044 │ │ 2012 │ WatchEvent │ 4048676 │ 6
  • 7. © 2022 Altinity, Inc. Pivoting event data using countIf() -- Github event data by year. SELECT toYear(created_at) AS year, countIf(event_type = 'PullRequestEvent') AS PRs, countIf(event_type = 'WatchEvent') AS Stars FROM github_events WHERE event_type IN ('PullRequestEvent', 'WatchEvent') GROUP BY year ┌─year─┬──────PRs─┬────Stars─┐ │ 2011 │ 476818 │ 1831742 │ │ 2012 │ 1807044 │ 4048676 │ │ 2013 │ 3103759 │ 7432800 │ │ 2014 │ 5923843 │ 11952935 │ 7 Way slower if you omit the IN clause
  • 8. © 2022 Altinity, Inc. Cleanup up multiple inputs on the fly with avgIf() SELECT count() AS rides, avgIf(passenger_count, passenger_count < 10) AS passengers_clean, avgIf(fare_amount, (fare_amount >= 0) AND (fare_amount < 500)) AS fares_clean FROM tripdata ┌──────rides─┬──passengers_clean─┬────────fares_clean─┐ │ 1310903963 │ 1.681498872092554 │ 11.424149328643022 │ └────────────┴───────────────────┴────────────────────┘ 1 rows in set. Elapsed: 1.092 sec. Processed 1.31 billion rows, 6.55 GB (1.20 billion rows/s., 6.00 GB/s.) 8 Multiple conditions in a single scan!
  • 9. © 2022 Altinity, Inc. Finding out more tricks for using -If combinators 9 ● ClickHouse docs - Very limited ● Altinity Knowledge Base - Examples using system tables and clever tricks ○ Example: Simple aggregate functions & combinators ● Tests in ClickHouse codebase – Examples of what’s possible but not why cd ~/git/ClickHouse/src/tests grep -r countIf . grep -r avgIf . grep -r weightedSumIf .
  • 10. © 2022 Altinity, Inc. © 2022 Altinity, Inc. Thank you! Questions? https://altinity.com rhodges at altinity.com 10 Altinity.Cloud Software and support for ClickHouse We’re hiring!