SlideShare a Scribd company logo
1 of 52
Download to read offline
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
What is HyperLogLog and
Why You Will Love It
Burak Yücesoy
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
• Number of unique elements (cardinality) in given data
• Useful to find things like…
• Number of unique users visited your web page
• Number of unique products in your inventory
What is COUNT(DISTINCT)?
2
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
What is COUNT(DISTINCT)?
3
logins
username | date
----------+-----------
Alice | 2018-10-02
Bob | 2018-10-03
Alice | 2018-10-05
Eve | 2018-10-07
Bob | 2018-10-07
Bob | 2018-10-08
• Number of logins: 6
• Number of unique users who log in: 3
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
• Slow
• High memory footprint
• Cannot work with appended/streaming data
Problems with Traditional COUNT(DISTINCT)
4
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
HyperLogLog(HLL) is faster alternative to COUNT(DISTINCT) with low
memory footprint;
• Approximation algorithm
• Estimates cardinality (i.e. COUNT(DISTINCT) ) of given data
• Mathematically provable error bounds
• It can estimate cardinalities well beyond 109
with 1% error rate using only 6 KB of memory
There is better way!
5
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
It depends...
Is it OK to approximate?
6
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
Is it OK to approximate?
7
• Count # of unique felonies associated to a person; Not OK
• Count # of unique visits to my web page; OK
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
HLL
• Very fast
• Low memory footprint
• Can work with streaming data
• Can merge estimations of two separate datasets efficiently
8
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
How does HLL work?
Steps;
1. Hash all elements
a. Ensures uniform data distribution
b. Can treat all data types same
2. Observing rare bit patterns
3. Stochastic averaging
9
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
How does HLL work? - Observing rare bit patterns
hash
Alice 645403841
binary
0010...001
Number of leading zeros: 2
Maximum number of leading zeros: 2
10
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
How does HLL work? - Observing rare bit patterns
hash
Bob 1492309842
binary
0101...010
Number of leading zeros: 1
Maximum number of leading zeros: 2
11
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
How does HLL work? - Observing rare bit patterns
...
Maximum number of leading zeros: 7
Cardinality Estimation: 27
12
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
How does HLL work? Stochastic Averaging
Measuring same thing repeatedly and taking average.
13
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 201814
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 201815
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
How does HLL work? Stochastic Averaging
Data
Partition 1
Partition 3
Partition 2
7
5
12
228.968...
Estimation
27
25
212
16
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
How does HLL work? Stochastic Averaging
01000101...010
First m bits to decide
partition number
Remaining bits to
count leading zeros
17
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
Error rate of HLL
• Typical Error Rate: 1.04 / sqrt(number of partitions)
• Memory need is number of partitions * log(log(max. value in hash space)) bit
• Can estimate cardinalities well beyond 109
with 1% error rate while using a
memory of only 6 kilobytes
• Memory vs accuracy tradeoff
18
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
Why does HLL work?
It turns out, combination of lots of bad observation is a
good observation
19
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
Some interesting examples
Alice
Alice
Alice
…
…
…
Alice
Partition 1
Partition 8
Partition 2
0
2
0
1.103...
Harmonic
Mean
20
22
20
hash
Alice 645403841
binary
00100110...001
... ... ...
20
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
Some interesting examples
Charlie
Partition 1
Partition 8
Partition 2
29
0
0
1.142...
Harmonic
Mean
229
20
20
hash
Charlie 0
binary
00000000...000
... ... ...
21
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
HLL in PostgreSQL
● https://github.com/citusdata/postgresql-hll
22
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
postgresql-hll uses a data structure, also called hll to keep maximum number of
leading zeros of each partition.
• Use hll_hash_bigint to hash elements.
• There are some other functions for other common data types.
• Use hll_add_agg to aggregate hashed elements into hll data structure.
• Use hll_cardinality to materialize hll data structure to actual distinct count.
HLL in PostgreSQL
23
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
Real Time Dashboard with
HyperLogLog
24
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
Precomputed aggregates for period of time and set of dimensions;
What is Rollup?
25
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
What is Rollup?
CREATE TABLE rollup_events_5min (
customer_id bigint,
event_type varchar,
country varchar,
browser varchar,
event_count bigint,
device_distinct_count bigint,
session_distinct_count bigint,
minute timestamp
);
CREATE TABLE events (
id bigint,
customer_id bigint,
event_type varchar,
country varchar,
browser varchar,
device_id bigint,
session_id bigint,
timestamp timestamp
);
26
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
What is Rollup?
CREATE TABLE rollup_events_5min (
customer_id bigint,
event_type varchar,
country varchar,
browser varchar,
event_count bigint,
device_distinct_count bigint,
session_distinct_count bigint,
minute timestamp
);
CREATE TABLE events (
id bigint,
customer_id bigint,
event_type varchar,
country varchar,
browser varchar,
device_id bigint,
session_id bigint,
timestamp timestamp
);
27
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
What is Rollup?
CREATE TABLE rollup_events_5min (
customer_id bigint,
event_type varchar,
country varchar,
browser varchar,
event_count bigint,
device_distinct_count bigint,
session_distinct_count bigint,
minute timestamp
);
CREATE TABLE events (
id bigint,
customer_id bigint,
event_type varchar,
country varchar,
browser varchar,
device_id bigint,
session_id bigint,
timestamp timestamp
);
28
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
What is Rollup?
CREATE TABLE rollup_events_5min (
customer_id bigint,
event_type varchar,
country varchar,
browser varchar,
event_count bigint,
device_distinct_count bigint,
session_distinct_count bigint,
minute timestamp
);
CREATE TABLE events (
id bigint,
customer_id bigint,
event_type varchar,
country varchar,
browser varchar,
device_id bigint,
session_id bigint,
timestamp timestamp
);
29
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
30
INSERT INTO rollup_events_5min
SELECT
customer_id,
event_type,
country,
browser,
COUNT(*) AS event_count,
COUNT (DISTINCT device_id) AS device_distinct_count,
COUNT (DISTINCT session_id) AS session_distinct_count,
date_trunc('seconds', (timestamp - TIMESTAMP 'epoch') / 300) * 300 + TIMESTAMP
'epoch' AS minute
FROM events
WHERE timestamp >= $1 AND timestamp <=$2
GROUP BY customer_id, event_type, country, browser, minute
What is Rollup?
30
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
31
INSERT INTO rollup_events_5min
SELECT
customer_id,
event_type,
country,
browser,
COUNT(*) AS event_count,
COUNT (DISTINCT device_id) AS device_distinct_count,
COUNT (DISTINCT session_id) AS session_distinct_count,
date_trunc('seconds', (timestamp - TIMESTAMP 'epoch') / 300) * 300 + TIMESTAMP
'epoch' AS minute
FROM events
WHERE timestamp >= $1 AND timestamp <=$2
GROUP BY customer_id, event_type, country, browser, minute
What is Rollup?
31
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
32
INSERT INTO rollup_events_5min
SELECT
customer_id,
event_type,
country,
browser,
COUNT(*) AS event_count,
COUNT (DISTINCT device_id) AS device_distinct_count,
COUNT (DISTINCT session_id) AS session_distinct_count,
date_trunc('seconds', (timestamp - TIMESTAMP 'epoch') / 300) * 300 + TIMESTAMP
'epoch' AS minute
FROM events
WHERE timestamp >= $1 AND timestamp <=$2
GROUP BY customer_id, event_type, country, browser, minute
What is Rollup?
32
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
33
INSERT INTO rollup_events_5min
SELECT
customer_id,
event_type,
country,
browser,
COUNT(*) AS event_count,
COUNT (DISTINCT device_id) AS device_distinct_count,
COUNT (DISTINCT session_id) AS session_distinct_count,
date_trunc('seconds', (timestamp - TIMESTAMP 'epoch') / 300) * 300 + TIMESTAMP
'epoch' AS minute
FROM events
WHERE timestamp >= $1 AND timestamp <=$2
GROUP BY customer_id, event_type, country, browser, minute
What is Rollup?
33
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
• Fast & indexed lookups of aggregates
• Avoid expensive repeated computations
• Rollups are compact (uses less space) and can be kept over longer periods
• Rollups can be further aggregated
Benefit of Rollup Tables
34
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
What if I want to get aggregation result for 1 hour period?
SELECT
customer_id,
event_type,
country,
browser,
SUM (event_count) AS event_count,
SUM (device_distinct_count) AS device_distinct_count,
SUM (session_distinct_count) AS session_distinct_count,
date_trunc('minutes', (minute - TIMESTAMP 'epoch') / 12) * 12 + TIMESTAMP 'epoch' AS
hour
FROM rollup_events_5min
GROUP BY customer_id, event_type, country, browser, minute
35
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
What if I want to get aggregation result for 1 hour period?
SELECT
customer_id,
event_type,
country,
browser,
SUM (event_count) AS event_count,
SUM (device_distinct_count) AS device_distinct_count,
SUM (session_distinct_count) AS session_distinct_count,
date_trunc('minutes', (minute - TIMESTAMP 'epoch') / 12) * 12 + TIMESTAMP 'epoch' AS
hour
FROM rollup_events_5min
GROUP BY customer_id, event_type, country, browser, minute
36
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
What if I want to get aggregation result for 1 hour period?
SELECT
customer_id,
event_type,
country,
browser,
SUM (event_count) AS event_count,
SUM (device_distinct_count) AS device_distinct_count,
SUM (session_distinct_count) AS session_distinct_count,
date_trunc('minutes', (minute - TIMESTAMP 'epoch') / 12) * 12 + TIMESTAMP 'epoch' AS
hour
FROM rollup_events_5min
GROUP BY customer_id, event_type, country, browser, minute
37
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
Rollup Table with HLL
CREATE TABLE rollup_events_5min (
customer_id bigint,
event_type varchar,
country varchar,
browser varchar,
event_count bigint,
device_distinct_count hll,
session_distinct_count hll,
minute timestamp
);
CREATE TABLE rollup_events_5min (
customer_id bigint,
event_type varchar,
country varchar,
browser varchar,
event_count bigint,
device_distinct_count bigint,
session_distinct_count bigint,
minute timestamp
);
38
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
39
INSERT INTO rollup_events_5min
SELECT
customer_id,
event_type,
country,
browser,
COUNT(*) AS event_count,
hll_add_agg(hll_hash_bigint(device_id)) AS device_distinct_count,
hll_add_agg(hll_hash_bigint(session_id)) AS session_distinct_count,
date_trunc('seconds', (timestamp - TIMESTAMP 'epoch') / 300) * 300 + TIMESTAMP
'epoch' AS minute
FROM events
WHERE timestamp >= $1 AND timestamp <=$2
GROUP BY customer_id, event_type, country, browser, minute
Rollup Table with HLL
39
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
What if I want to get aggregation result for 1 hour period?
SELECT
customer_id,
event_type,
country,
browser,
SUM (event_count) AS event_count,
hll_union_agg (device_distinct_count) AS device_distinct_count,
hll_union_agg (session_distinct_count) AS session_distinct_count,
date_trunc('minutes', (minute - TIMESTAMP 'epoch') / 12) * 12 + TIMESTAMP 'epoch' AS
hour
FROM rollup_events_5min
GROUP BY customer_id, event_type, country, browser, minute
40
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
Interval 1
Interval 1
Partition 1
Interval 1
Partition 3
Interval 1
Partition 2
7
5
12
HLL(7, 5, 12)
Intermediate
Result
How to Merge COUNT(DISTINCT) with HLL
41
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
Interval 2
Interval 2
Partition 1
Interval 2
Partition 3
Interval 2
Partition 2
11
7
8
HLL(11, 7, 8)
Intermediate
Result
How to Merge COUNT(DISTINCT) with HLL
42
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
11
7
12
1053.255
Estimation
211
27
212
HLL(11, 7, 8)
HLL(7, 5, 12)
HLL(11, 7, 12)
hll_union_agg
How to Merge COUNT(DISTINCT) with HLL
43
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
Interval 1
+
Interval 2
Interval 1
Partition 1(7)
+
Interval 2
Partition 1(11)
11
7
12
1053.255
Estimation
Interval 1
Partition 2(5)
+
Interval 2
Partition 2(7)
Interval 1
Partition 3(12)
+
Interval 2
Partition 4(8)
How to Merge COUNT(DISTINCT) with HLL
44
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
• What if ...
• Without hll, you would have to maintain 2n
- 1 rollup tables to cover all
combinations in n columns (multiply this with number of time intervals).
45
Rollup Table with HLL
45
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
What Happens in Distributed
Scenario?
46
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
1. Separate data into shards.
events_001 events_002 events_003
postgresql-hll in distributed environment
47
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
2. Put shards into separate nodes.
Worker
Node 1
Coordinator
Worker
Node 2
Worker
Node 3
events_001 events_002 events_003
postgresql-hll in distributed environment
48
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
3. For each shard, calculate hll (but do not materialize).
postgresql-hll in distributed environment
Shard 1
Shard 1
Partition 1
Shard 1
Partition 3
Shard 1
Partition 2
7
5
12
HLL(7, 5, 12)
Intermediate
Result
49
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
4. Pull intermediate results to a single node.
Worker
Node 1
events_001
Coordinator
Worker
Node 2
events_002
Worker
Node 3
events_003
HLL(6, 4, 11) HLL(10, 6, 7) HLL(7, 12, 5)
postgresql-hll in distributed environment
50
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
5. Merge separate hll data structures and materialize them
11
13
12
10532.571...
211
213
212
HLL(11, 7, 8)
HLL(7, 5, 12)
HLL(11, 13, 12)
HLL(8, 13, 6)
postgresql-hll in distributed environment
51
Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018
burak@citusdata.com
Thanks
&
Questions
@byucesoy
Burak Yücesoy
www.citusdata.com @citusdata

More Related Content

What's hot

Monitoring Flink with Prometheus
Monitoring Flink with PrometheusMonitoring Flink with Prometheus
Monitoring Flink with PrometheusMaximilian Bode
 
Transactions and Concurrency Control Patterns
Transactions and Concurrency Control PatternsTransactions and Concurrency Control Patterns
Transactions and Concurrency Control PatternsJ On The Beach
 
The Ceph RGW archive zone feature (Ceph Days 2019)
The Ceph RGW archive zone feature (Ceph Days 2019)The Ceph RGW archive zone feature (Ceph Days 2019)
The Ceph RGW archive zone feature (Ceph Days 2019)Igalia
 
Intro to InfluxDB
Intro to InfluxDBIntro to InfluxDB
Intro to InfluxDBInfluxData
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesScyllaDB
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaPrajal Kulkarni
 
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...InfluxData
 
Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編Taro L. Saito
 
Creating Continuously Up to Date Materialized Aggregates
Creating Continuously Up to Date Materialized AggregatesCreating Continuously Up to Date Materialized Aggregates
Creating Continuously Up to Date Materialized AggregatesEDB
 
Dual write strategies for microservices
Dual write strategies for microservicesDual write strategies for microservices
Dual write strategies for microservicesBilgin Ibryam
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and ThenSATOSHI TAGOMORI
 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling TwitterBlaine
 
How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?SearchStax
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyYaroslav Tkachenko
 
Hazelcast Essentials
Hazelcast EssentialsHazelcast Essentials
Hazelcast EssentialsRahul Gupta
 
パタゴニア日本支社に対する戦略提言
パタゴニア日本支社に対する戦略提言パタゴニア日本支社に対する戦略提言
パタゴニア日本支社に対する戦略提言Tomohiro KIMURA
 
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...confluent
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 

What's hot (20)

Monitoring Flink with Prometheus
Monitoring Flink with PrometheusMonitoring Flink with Prometheus
Monitoring Flink with Prometheus
 
Transactions and Concurrency Control Patterns
Transactions and Concurrency Control PatternsTransactions and Concurrency Control Patterns
Transactions and Concurrency Control Patterns
 
The Ceph RGW archive zone feature (Ceph Days 2019)
The Ceph RGW archive zone feature (Ceph Days 2019)The Ceph RGW archive zone feature (Ceph Days 2019)
The Ceph RGW archive zone feature (Ceph Days 2019)
 
Intro to InfluxDB
Intro to InfluxDBIntro to InfluxDB
Intro to InfluxDB
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary Differences
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
 
Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編
 
Creating Continuously Up to Date Materialized Aggregates
Creating Continuously Up to Date Materialized AggregatesCreating Continuously Up to Date Materialized Aggregates
Creating Continuously Up to Date Materialized Aggregates
 
Dual write strategies for microservices
Dual write strategies for microservicesDual write strategies for microservices
Dual write strategies for microservices
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling Twitter
 
How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
 
Hazelcast Essentials
Hazelcast EssentialsHazelcast Essentials
Hazelcast Essentials
 
Event-sourced architectures with Akka
Event-sourced architectures with AkkaEvent-sourced architectures with Akka
Event-sourced architectures with Akka
 
HDFS Selective Wire Encryption
HDFS Selective Wire EncryptionHDFS Selective Wire Encryption
HDFS Selective Wire Encryption
 
パタゴニア日本支社に対する戦略提言
パタゴニア日本支社に対する戦略提言パタゴニア日本支社に対する戦略提言
パタゴニア日本支社に対する戦略提言
 
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 

Similar to What is HyperLogLog and Why You Will Love It | PostgreSQL Conference Europe 2018 | Burak Yucesoy

Distributed count(distinct) with hyper loglog on postgresql | PGConf EU 2017)...
Distributed count(distinct) with hyper loglog on postgresql | PGConf EU 2017)...Distributed count(distinct) with hyper loglog on postgresql | PGConf EU 2017)...
Distributed count(distinct) with hyper loglog on postgresql | PGConf EU 2017)...Citus Data
 
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur CubukcuThe State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur CubukcuCitus Data
 
KPIs implementation and decision tree algorithms as support tools in wastewat...
KPIs implementation and decision tree algorithms as support tools in wastewat...KPIs implementation and decision tree algorithms as support tools in wastewat...
KPIs implementation and decision tree algorithms as support tools in wastewat...GiuseppeAntonello
 
RIR Collaboration on RIPEstat
RIR Collaboration on RIPEstatRIR Collaboration on RIPEstat
RIR Collaboration on RIPEstatRIPE NCC
 
Data Gathering and Analysis BoF- RipEstat
Data Gathering and Analysis BoF- RipEstatData Gathering and Analysis BoF- RipEstat
Data Gathering and Analysis BoF- RipEstatAPNIC
 
Large Scale Internet Measurements Infrastructures
Large Scale Internet Measurements InfrastructuresLarge Scale Internet Measurements Infrastructures
Large Scale Internet Measurements InfrastructuresRIPE NCC
 
IAOS2018 - The EuroGroups Register, A. Bikauskaite, A. Götzfried, Z. Völfinger
IAOS2018 - The EuroGroups Register, A. Bikauskaite, A. Götzfried, Z. VölfingerIAOS2018 - The EuroGroups Register, A. Bikauskaite, A. Götzfried, Z. Völfinger
IAOS2018 - The EuroGroups Register, A. Bikauskaite, A. Götzfried, Z. VölfingerStatsCommunications
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupMárton Kodok
 
Bitcoin Price Prediction for Long, Short and Medium Time Frame
Bitcoin Price Prediction for Long, Short and Medium Time FrameBitcoin Price Prediction for Long, Short and Medium Time Frame
Bitcoin Price Prediction for Long, Short and Medium Time FrameIRJET Journal
 
Move out from your comfort zone!
Move out from your comfort zone!Move out from your comfort zone!
Move out from your comfort zone!Osaka University
 
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Citus Data
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidDatabricks
 
IoT: beyond the coffee machine
IoT: beyond the coffee machineIoT: beyond the coffee machine
IoT: beyond the coffee machineEric Favre
 
Introduction to statistics project
Introduction to statistics projectIntroduction to statistics project
Introduction to statistics projectLuciaRavazzi
 
New from BookNet Canada: BNC BiblioShare
New from BookNet Canada: BNC BiblioShareNew from BookNet Canada: BNC BiblioShare
New from BookNet Canada: BNC BiblioShareBookNet Canada
 
Webinar on 4th Industrial Revolution, IoT and RPA
Webinar on 4th Industrial Revolution, IoT and RPAWebinar on 4th Industrial Revolution, IoT and RPA
Webinar on 4th Industrial Revolution, IoT and RPARedwan Ferdous
 
EU: Electronic Calculators and Pocket-Size Data Recording, Reproducing and Di...
EU: Electronic Calculators and Pocket-Size Data Recording, Reproducing and Di...EU: Electronic Calculators and Pocket-Size Data Recording, Reproducing and Di...
EU: Electronic Calculators and Pocket-Size Data Recording, Reproducing and Di...IndexBox Marketing
 

Similar to What is HyperLogLog and Why You Will Love It | PostgreSQL Conference Europe 2018 | Burak Yucesoy (20)

Distributed count(distinct) with hyper loglog on postgresql | PGConf EU 2017)...
Distributed count(distinct) with hyper loglog on postgresql | PGConf EU 2017)...Distributed count(distinct) with hyper loglog on postgresql | PGConf EU 2017)...
Distributed count(distinct) with hyper loglog on postgresql | PGConf EU 2017)...
 
2019 GDRR: Blockchain Data Analytics - Cryptocurrency and blockchain analysis...
2019 GDRR: Blockchain Data Analytics - Cryptocurrency and blockchain analysis...2019 GDRR: Blockchain Data Analytics - Cryptocurrency and blockchain analysis...
2019 GDRR: Blockchain Data Analytics - Cryptocurrency and blockchain analysis...
 
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur CubukcuThe State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
 
KPIs implementation and decision tree algorithms as support tools in wastewat...
KPIs implementation and decision tree algorithms as support tools in wastewat...KPIs implementation and decision tree algorithms as support tools in wastewat...
KPIs implementation and decision tree algorithms as support tools in wastewat...
 
RIR Collaboration on RIPEstat
RIR Collaboration on RIPEstatRIR Collaboration on RIPEstat
RIR Collaboration on RIPEstat
 
Data Gathering and Analysis BoF- RipEstat
Data Gathering and Analysis BoF- RipEstatData Gathering and Analysis BoF- RipEstat
Data Gathering and Analysis BoF- RipEstat
 
Large Scale Internet Measurements Infrastructures
Large Scale Internet Measurements InfrastructuresLarge Scale Internet Measurements Infrastructures
Large Scale Internet Measurements Infrastructures
 
IAOS2018 - The EuroGroups Register, A. Bikauskaite, A. Götzfried, Z. Völfinger
IAOS2018 - The EuroGroups Register, A. Bikauskaite, A. Götzfried, Z. VölfingerIAOS2018 - The EuroGroups Register, A. Bikauskaite, A. Götzfried, Z. Völfinger
IAOS2018 - The EuroGroups Register, A. Bikauskaite, A. Götzfried, Z. Völfinger
 
X18136931 dwbi report
X18136931 dwbi reportX18136931 dwbi report
X18136931 dwbi report
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch Warmup
 
Bitcoin Price Prediction for Long, Short and Medium Time Frame
Bitcoin Price Prediction for Long, Short and Medium Time FrameBitcoin Price Prediction for Long, Short and Medium Time Frame
Bitcoin Price Prediction for Long, Short and Medium Time Frame
 
Move out from your comfort zone!
Move out from your comfort zone!Move out from your comfort zone!
Move out from your comfort zone!
 
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
 
Blockchain for Marketing & Insights
Blockchain for Marketing & InsightsBlockchain for Marketing & Insights
Blockchain for Marketing & Insights
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and Druid
 
IoT: beyond the coffee machine
IoT: beyond the coffee machineIoT: beyond the coffee machine
IoT: beyond the coffee machine
 
Introduction to statistics project
Introduction to statistics projectIntroduction to statistics project
Introduction to statistics project
 
New from BookNet Canada: BNC BiblioShare
New from BookNet Canada: BNC BiblioShareNew from BookNet Canada: BNC BiblioShare
New from BookNet Canada: BNC BiblioShare
 
Webinar on 4th Industrial Revolution, IoT and RPA
Webinar on 4th Industrial Revolution, IoT and RPAWebinar on 4th Industrial Revolution, IoT and RPA
Webinar on 4th Industrial Revolution, IoT and RPA
 
EU: Electronic Calculators and Pocket-Size Data Recording, Reproducing and Di...
EU: Electronic Calculators and Pocket-Size Data Recording, Reproducing and Di...EU: Electronic Calculators and Pocket-Size Data Recording, Reproducing and Di...
EU: Electronic Calculators and Pocket-Size Data Recording, Reproducing and Di...
 

More from Citus Data

Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Citus Data
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
 
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...Citus Data
 
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Citus Data
 
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensCitus Data
 
When it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberWhen it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberCitus Data
 
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncAmazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncCitus Data
 
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...Citus Data
 
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisDeep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisCitus Data
 
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Citus Data
 
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncA story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncCitus Data
 
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Citus Data
 
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineCitus Data
 
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Citus Data
 
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberWhen it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberCitus Data
 
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineCitus Data
 
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineHow to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineCitus Data
 
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberWhen it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberCitus Data
 
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoWhy PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoCitus Data
 
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Citus Data
 

More from Citus Data (20)

Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
 
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
 
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
 
When it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberWhen it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will Leinweber
 
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncAmazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
 
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
 
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisDeep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
 
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
 
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncA story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
 
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
 
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
 
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
 
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberWhen it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
 
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
 
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineHow to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
 
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberWhen it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
 
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoWhy PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
 
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

What is HyperLogLog and Why You Will Love It | PostgreSQL Conference Europe 2018 | Burak Yucesoy

  • 1. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 What is HyperLogLog and Why You Will Love It Burak Yücesoy
  • 2. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 • Number of unique elements (cardinality) in given data • Useful to find things like… • Number of unique users visited your web page • Number of unique products in your inventory What is COUNT(DISTINCT)? 2
  • 3. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 What is COUNT(DISTINCT)? 3 logins username | date ----------+----------- Alice | 2018-10-02 Bob | 2018-10-03 Alice | 2018-10-05 Eve | 2018-10-07 Bob | 2018-10-07 Bob | 2018-10-08 • Number of logins: 6 • Number of unique users who log in: 3
  • 4. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 • Slow • High memory footprint • Cannot work with appended/streaming data Problems with Traditional COUNT(DISTINCT) 4
  • 5. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 HyperLogLog(HLL) is faster alternative to COUNT(DISTINCT) with low memory footprint; • Approximation algorithm • Estimates cardinality (i.e. COUNT(DISTINCT) ) of given data • Mathematically provable error bounds • It can estimate cardinalities well beyond 109 with 1% error rate using only 6 KB of memory There is better way! 5
  • 6. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 It depends... Is it OK to approximate? 6
  • 7. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 Is it OK to approximate? 7 • Count # of unique felonies associated to a person; Not OK • Count # of unique visits to my web page; OK
  • 8. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 HLL • Very fast • Low memory footprint • Can work with streaming data • Can merge estimations of two separate datasets efficiently 8
  • 9. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 How does HLL work? Steps; 1. Hash all elements a. Ensures uniform data distribution b. Can treat all data types same 2. Observing rare bit patterns 3. Stochastic averaging 9
  • 10. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 How does HLL work? - Observing rare bit patterns hash Alice 645403841 binary 0010...001 Number of leading zeros: 2 Maximum number of leading zeros: 2 10
  • 11. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 How does HLL work? - Observing rare bit patterns hash Bob 1492309842 binary 0101...010 Number of leading zeros: 1 Maximum number of leading zeros: 2 11
  • 12. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 How does HLL work? - Observing rare bit patterns ... Maximum number of leading zeros: 7 Cardinality Estimation: 27 12
  • 13. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 How does HLL work? Stochastic Averaging Measuring same thing repeatedly and taking average. 13
  • 14. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 201814
  • 15. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 201815
  • 16. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 How does HLL work? Stochastic Averaging Data Partition 1 Partition 3 Partition 2 7 5 12 228.968... Estimation 27 25 212 16
  • 17. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 How does HLL work? Stochastic Averaging 01000101...010 First m bits to decide partition number Remaining bits to count leading zeros 17
  • 18. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 Error rate of HLL • Typical Error Rate: 1.04 / sqrt(number of partitions) • Memory need is number of partitions * log(log(max. value in hash space)) bit • Can estimate cardinalities well beyond 109 with 1% error rate while using a memory of only 6 kilobytes • Memory vs accuracy tradeoff 18
  • 19. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 Why does HLL work? It turns out, combination of lots of bad observation is a good observation 19
  • 20. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 Some interesting examples Alice Alice Alice … … … Alice Partition 1 Partition 8 Partition 2 0 2 0 1.103... Harmonic Mean 20 22 20 hash Alice 645403841 binary 00100110...001 ... ... ... 20
  • 21. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 Some interesting examples Charlie Partition 1 Partition 8 Partition 2 29 0 0 1.142... Harmonic Mean 229 20 20 hash Charlie 0 binary 00000000...000 ... ... ... 21
  • 22. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 HLL in PostgreSQL ● https://github.com/citusdata/postgresql-hll 22
  • 23. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 postgresql-hll uses a data structure, also called hll to keep maximum number of leading zeros of each partition. • Use hll_hash_bigint to hash elements. • There are some other functions for other common data types. • Use hll_add_agg to aggregate hashed elements into hll data structure. • Use hll_cardinality to materialize hll data structure to actual distinct count. HLL in PostgreSQL 23
  • 24. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 Real Time Dashboard with HyperLogLog 24
  • 25. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 Precomputed aggregates for period of time and set of dimensions; What is Rollup? 25
  • 26. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 What is Rollup? CREATE TABLE rollup_events_5min ( customer_id bigint, event_type varchar, country varchar, browser varchar, event_count bigint, device_distinct_count bigint, session_distinct_count bigint, minute timestamp ); CREATE TABLE events ( id bigint, customer_id bigint, event_type varchar, country varchar, browser varchar, device_id bigint, session_id bigint, timestamp timestamp ); 26
  • 27. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 What is Rollup? CREATE TABLE rollup_events_5min ( customer_id bigint, event_type varchar, country varchar, browser varchar, event_count bigint, device_distinct_count bigint, session_distinct_count bigint, minute timestamp ); CREATE TABLE events ( id bigint, customer_id bigint, event_type varchar, country varchar, browser varchar, device_id bigint, session_id bigint, timestamp timestamp ); 27
  • 28. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 What is Rollup? CREATE TABLE rollup_events_5min ( customer_id bigint, event_type varchar, country varchar, browser varchar, event_count bigint, device_distinct_count bigint, session_distinct_count bigint, minute timestamp ); CREATE TABLE events ( id bigint, customer_id bigint, event_type varchar, country varchar, browser varchar, device_id bigint, session_id bigint, timestamp timestamp ); 28
  • 29. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 What is Rollup? CREATE TABLE rollup_events_5min ( customer_id bigint, event_type varchar, country varchar, browser varchar, event_count bigint, device_distinct_count bigint, session_distinct_count bigint, minute timestamp ); CREATE TABLE events ( id bigint, customer_id bigint, event_type varchar, country varchar, browser varchar, device_id bigint, session_id bigint, timestamp timestamp ); 29
  • 30. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 30 INSERT INTO rollup_events_5min SELECT customer_id, event_type, country, browser, COUNT(*) AS event_count, COUNT (DISTINCT device_id) AS device_distinct_count, COUNT (DISTINCT session_id) AS session_distinct_count, date_trunc('seconds', (timestamp - TIMESTAMP 'epoch') / 300) * 300 + TIMESTAMP 'epoch' AS minute FROM events WHERE timestamp >= $1 AND timestamp <=$2 GROUP BY customer_id, event_type, country, browser, minute What is Rollup? 30
  • 31. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 31 INSERT INTO rollup_events_5min SELECT customer_id, event_type, country, browser, COUNT(*) AS event_count, COUNT (DISTINCT device_id) AS device_distinct_count, COUNT (DISTINCT session_id) AS session_distinct_count, date_trunc('seconds', (timestamp - TIMESTAMP 'epoch') / 300) * 300 + TIMESTAMP 'epoch' AS minute FROM events WHERE timestamp >= $1 AND timestamp <=$2 GROUP BY customer_id, event_type, country, browser, minute What is Rollup? 31
  • 32. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 32 INSERT INTO rollup_events_5min SELECT customer_id, event_type, country, browser, COUNT(*) AS event_count, COUNT (DISTINCT device_id) AS device_distinct_count, COUNT (DISTINCT session_id) AS session_distinct_count, date_trunc('seconds', (timestamp - TIMESTAMP 'epoch') / 300) * 300 + TIMESTAMP 'epoch' AS minute FROM events WHERE timestamp >= $1 AND timestamp <=$2 GROUP BY customer_id, event_type, country, browser, minute What is Rollup? 32
  • 33. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 33 INSERT INTO rollup_events_5min SELECT customer_id, event_type, country, browser, COUNT(*) AS event_count, COUNT (DISTINCT device_id) AS device_distinct_count, COUNT (DISTINCT session_id) AS session_distinct_count, date_trunc('seconds', (timestamp - TIMESTAMP 'epoch') / 300) * 300 + TIMESTAMP 'epoch' AS minute FROM events WHERE timestamp >= $1 AND timestamp <=$2 GROUP BY customer_id, event_type, country, browser, minute What is Rollup? 33
  • 34. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 • Fast & indexed lookups of aggregates • Avoid expensive repeated computations • Rollups are compact (uses less space) and can be kept over longer periods • Rollups can be further aggregated Benefit of Rollup Tables 34
  • 35. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 What if I want to get aggregation result for 1 hour period? SELECT customer_id, event_type, country, browser, SUM (event_count) AS event_count, SUM (device_distinct_count) AS device_distinct_count, SUM (session_distinct_count) AS session_distinct_count, date_trunc('minutes', (minute - TIMESTAMP 'epoch') / 12) * 12 + TIMESTAMP 'epoch' AS hour FROM rollup_events_5min GROUP BY customer_id, event_type, country, browser, minute 35
  • 36. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 What if I want to get aggregation result for 1 hour period? SELECT customer_id, event_type, country, browser, SUM (event_count) AS event_count, SUM (device_distinct_count) AS device_distinct_count, SUM (session_distinct_count) AS session_distinct_count, date_trunc('minutes', (minute - TIMESTAMP 'epoch') / 12) * 12 + TIMESTAMP 'epoch' AS hour FROM rollup_events_5min GROUP BY customer_id, event_type, country, browser, minute 36
  • 37. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 What if I want to get aggregation result for 1 hour period? SELECT customer_id, event_type, country, browser, SUM (event_count) AS event_count, SUM (device_distinct_count) AS device_distinct_count, SUM (session_distinct_count) AS session_distinct_count, date_trunc('minutes', (minute - TIMESTAMP 'epoch') / 12) * 12 + TIMESTAMP 'epoch' AS hour FROM rollup_events_5min GROUP BY customer_id, event_type, country, browser, minute 37
  • 38. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 Rollup Table with HLL CREATE TABLE rollup_events_5min ( customer_id bigint, event_type varchar, country varchar, browser varchar, event_count bigint, device_distinct_count hll, session_distinct_count hll, minute timestamp ); CREATE TABLE rollup_events_5min ( customer_id bigint, event_type varchar, country varchar, browser varchar, event_count bigint, device_distinct_count bigint, session_distinct_count bigint, minute timestamp ); 38
  • 39. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 39 INSERT INTO rollup_events_5min SELECT customer_id, event_type, country, browser, COUNT(*) AS event_count, hll_add_agg(hll_hash_bigint(device_id)) AS device_distinct_count, hll_add_agg(hll_hash_bigint(session_id)) AS session_distinct_count, date_trunc('seconds', (timestamp - TIMESTAMP 'epoch') / 300) * 300 + TIMESTAMP 'epoch' AS minute FROM events WHERE timestamp >= $1 AND timestamp <=$2 GROUP BY customer_id, event_type, country, browser, minute Rollup Table with HLL 39
  • 40. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 What if I want to get aggregation result for 1 hour period? SELECT customer_id, event_type, country, browser, SUM (event_count) AS event_count, hll_union_agg (device_distinct_count) AS device_distinct_count, hll_union_agg (session_distinct_count) AS session_distinct_count, date_trunc('minutes', (minute - TIMESTAMP 'epoch') / 12) * 12 + TIMESTAMP 'epoch' AS hour FROM rollup_events_5min GROUP BY customer_id, event_type, country, browser, minute 40
  • 41. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 Interval 1 Interval 1 Partition 1 Interval 1 Partition 3 Interval 1 Partition 2 7 5 12 HLL(7, 5, 12) Intermediate Result How to Merge COUNT(DISTINCT) with HLL 41
  • 42. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 Interval 2 Interval 2 Partition 1 Interval 2 Partition 3 Interval 2 Partition 2 11 7 8 HLL(11, 7, 8) Intermediate Result How to Merge COUNT(DISTINCT) with HLL 42
  • 43. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 11 7 12 1053.255 Estimation 211 27 212 HLL(11, 7, 8) HLL(7, 5, 12) HLL(11, 7, 12) hll_union_agg How to Merge COUNT(DISTINCT) with HLL 43
  • 44. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 Interval 1 + Interval 2 Interval 1 Partition 1(7) + Interval 2 Partition 1(11) 11 7 12 1053.255 Estimation Interval 1 Partition 2(5) + Interval 2 Partition 2(7) Interval 1 Partition 3(12) + Interval 2 Partition 4(8) How to Merge COUNT(DISTINCT) with HLL 44
  • 45. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 • What if ... • Without hll, you would have to maintain 2n - 1 rollup tables to cover all combinations in n columns (multiply this with number of time intervals). 45 Rollup Table with HLL 45
  • 46. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 What Happens in Distributed Scenario? 46
  • 47. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 1. Separate data into shards. events_001 events_002 events_003 postgresql-hll in distributed environment 47
  • 48. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 2. Put shards into separate nodes. Worker Node 1 Coordinator Worker Node 2 Worker Node 3 events_001 events_002 events_003 postgresql-hll in distributed environment 48
  • 49. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 3. For each shard, calculate hll (but do not materialize). postgresql-hll in distributed environment Shard 1 Shard 1 Partition 1 Shard 1 Partition 3 Shard 1 Partition 2 7 5 12 HLL(7, 5, 12) Intermediate Result 49
  • 50. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 4. Pull intermediate results to a single node. Worker Node 1 events_001 Coordinator Worker Node 2 events_002 Worker Node 3 events_003 HLL(6, 4, 11) HLL(10, 6, 7) HLL(7, 12, 5) postgresql-hll in distributed environment 50
  • 51. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 5. Merge separate hll data structures and materialize them 11 13 12 10532.571... 211 213 212 HLL(11, 7, 8) HLL(7, 5, 12) HLL(11, 13, 12) HLL(8, 13, 6) postgresql-hll in distributed environment 51
  • 52. Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018Burak Yücesoy | Citus Data | PGConf EU 2018 | October 2018 burak@citusdata.com Thanks & Questions @byucesoy Burak Yücesoy www.citusdata.com @citusdata