SlideShare a Scribd company logo
1 of 67
Download to read offline
1/44
PostgreSQL
Optimisation of queries with grouping
Alexey Bashtanov, Brandwatch
28 Jan 2016
2/44
What is it all about?
This talk will cover optimisation of
Grouping
Aggregation
Unfortunately it will not cover optimisation of
Getting the data
Filtering
Joins
Window functions
Other data transformations
3/44
Outline
1 What is a grouping?
2 How does it work?
Aggregation functions under the hood
Grouping algorithms
3 Optimisation
Avoiding sorts
Summation
Denormalized data aggregation
Arg-maximum
4 Still slow?
4/44
What is a grouping?
5/44
What is a grouping?
What do we call a grouping/aggregation operation?
An operation of splitting input data into several classes and
then compilation each class into one row.
3
32 21 1
3
3
3
3
1
1
2
2 15
2
2
2
2
3
3
1
1 8
3
3
2
2
3
3
1
1 9
6/44
Examples
SELECT department_id,
avg(salary)
FROM employees
GROUP BY department_id
SELECT DISTINCT department_id
FROM employees
7/44
Examples
SELECT DISTINCT ON (department_id)
department_id,
employee_id,
salary
FROM employees
ORDER BY department_id,
salary DESC
8/44
Examples
SELECT max(salary)
FROM employees
SELECT salary
FROM employees
ORDER BY salary DESC
LIMIT 1
9/44
How does it work?
10/44
Aggregation functions under the hood
INITCOND SFUNC
Input data
state SFUNC
Input data
state SFUNC
Input data
state
FINALFUNC
Result
An aggregate function is defined by:
State, input and output types
Initial state (INITCOND)
Transition function (SFUNC)
Final function (FINALFUNC)
10/44
Aggregation functions under the hood
state = 0 state += input
2
2 state += input
3
5 state += input
7
12
=
sum=12
SELECT sum(column1),
avg(column1)
FROM (VALUES (2), (3), (7)) _
10/44
Aggregation functions under the hood
cnt = 0
sum = 0
cnt++
sum+=input
2
cnt=1
sum=2
cnt++
sum+=input
3
cnt=2
sum=5
cnt++
sum+=input
7
cnt=3
sum=12
sum / cnt
avg=4
SELECT sum(column1),
avg(column1)
FROM (VALUES (2), (3), (7)) _
11/44
Aggregation functions under the hood
SFUNC and FINALFUNC functions can be written in
C — fast (SFUNC may modify input state and return it)
SQL
PL/pgSQL — SLOW!
any other language
SFUNC and FINALFUNC functions can be declared STRICT
(i.e. not called on null input)
12/44
Grouping algorithms
PostgreSQL uses 2 algorithms to feed aggregate functions by
grouped data:
GroupAggregate: get the data sorted and apply
aggregation function to groups one by one
HashAggregate: store state for each key in a hash table
13/44
GroupAgg
1 3 1 2 2 3 1 3 2 1 state: 0
13/44
GroupAgg
1 3 1 2 2 3 1 3 2 1 state: 0
1 3 1 2 2 3 1 3 state: 3
13/44
GroupAgg
1 3 1 2 2 3 1 3 2 1 state: 0
1 3 1 2 2 3 1 3 state: 3
1 3 1 2 2 state: 4 6
13/44
GroupAgg
1 3 1 2 2 3 1 3 2 1 state: 0
1 3 1 2 2 3 1 3 state: 3
1 3 1 2 2 state: 4 6
1 3 1 state: 0 8 6
13/44
GroupAgg
1 3 1 2 2 3 1 3 2 1 state: 0
1 3 1 2 2 3 1 3 state: 3
1 3 1 2 2 state: 4 6
1 3 1 state: 0 8 6
5 8 6
14/44
HashAggregate
1 2 3 2 3 1 2 1 3 1 state: 0
14/44
HashAggregate
1 2 3 2 3 1 2 1 3 1 state: 0
1 2 3 2 3 1 2 1 3 state: 1
14/44
HashAggregate
1 2 3 2 3 1 2 1 3 1 state: 0
1 2 3 2 3 1 2 1 3 state: 1
1 2 3 2 3 1 2 1
state: 1
state: 3
14/44
HashAggregate
1 2 3 2 3 1 2 1 3 1 state: 0
1 2 3 2 3 1 2 1 3 state: 1
1 2 3 2 3 1 2 1
state: 1
state: 3
1 2 3
state: 6
state: 6
state: 1
14/44
HashAggregate
1 2 3 2 3 1 2 1 3 1 state: 0
1 2 3 2 3 1 2 1 3 state: 1
1 2 3 2 3 1 2 1
state: 1
state: 3
1 2 3
state: 6
state: 6
state: 1
state: 6
state: 8
state: 5
14/44
HashAggregate
1 2 3 2 3 1 2 1 3 1 state: 0
1 2 3 2 3 1 2 1 3 state: 1
1 2 3 2 3 1 2 1
state: 1
state: 3
1 2 3
state: 6
state: 6
state: 1
state: 6
state: 8
state: 5
68 5
15/44
GroupAggregate vs. HashAggregate
GroupAggregate
− Requires sorted data
+ Needs less memory
+ Returns sorted data
+ Returns data on the fly
+ Can perform
count(distinct ...),
array_agg(... order by ...)
etc.
HashAggregate
+ Accepts unsorted data
− Needs more memory
− Returns unsorted data
− Returns data at the end
− Can perform only basic
aggregation
16/44
Optimisation
17/44
Avoiding sorts
Sorts are really slow. Prefer HashAggregation if possible.
17/44
Avoiding sorts
Sorts are really slow. Prefer HashAggregation if possible.
What to do if you get something like this?
EXPLAIN
SELECT region_id,
avg(age)
FROM people
GROUP BY region_id
GroupAggregate (cost=149244.84..156869.46 rows=9969 width=10)
-> Sort (cost=149244.84..151744.84 rows=1000000 width=10)
Sort Key: region_id
-> Seq Scan on people (cost=0.00..15406.00 rows=1000000 width=10)
1504.474 ms
17/44
Avoiding sorts
Sorts are really slow. Prefer HashAggregation if possible.
What to do if you get something like this?
EXPLAIN
SELECT region_id,
avg(age)
FROM people
GROUP BY region_id
set enable_sort to off?
17/44
Avoiding sorts
Sorts are really slow. Prefer HashAggregation if possible.
What to do if you get something like this?
EXPLAIN
SELECT region_id,
avg(age)
FROM people
GROUP BY region_id
set enable_sort to off? No!
GroupAggregate (cost=10000149244.84..10000156869.46 rows=9969 width=10)
-> Sort (cost=10000149244.84..10000151744.84 rows=1000000 width=10)
Sort Key: region_id
-> Seq Scan on people (cost=0.00..15406.00 rows=1000000 width=10)
1497.167 ms
17/44
Avoiding sorts
Sorts are really slow. Prefer HashAggregation if possible.
What to do if you get something like this?
EXPLAIN
SELECT region_id,
avg(age)
FROM people
GROUP BY region_id
Increase work_mem: set work_mem to ’100MB’
HashAggregate (cost=20406.00..20530.61 rows=9969 width=10)
-> Seq Scan on people (cost=0.00..15406.00 rows=1000000 width=10)
685.689 ms
17/44
Avoiding sorts
Sorts are really slow. Prefer HashAggregation if possible.
What to do if you get something like this?
EXPLAIN
SELECT region_id,
avg(age)
FROM people
GROUP BY region_id
Increase work_mem: set work_mem to ’100MB’
HashAggregate (cost=20406.00..20530.61 rows=9969 width=10)
-> Seq Scan on people (cost=0.00..15406.00 rows=1000000 width=10)
685.689 ms
Increase sanely to avoid OOM
18/44
Avoiding sorts
How to spend less memory to allow HashAggregation?
Don’t aggregate joined
SELECT p.region_id,
d.region_description,
avg(age)
FROM people p
JOIN regions r using (region_id)
GROUP BY region_id,
region_description
Join aggregated instead
SELECT a.region_id,
r.region_description,
a.avg_age
FROM (
SELECT region_id,
avg(age) avg_age
FROM people p
GROUP BY region_id
) a
JOIN regions r using (region_id)
19/44
Avoiding sorts
How to avoid sorts for count(DISTINCT ...)?
SELECT date_trunc(’month’, visit_date),
count(DISTINCT visitor_id)
FROM visits
GROUP BY date_trunc(’month’, visit_date)
GroupAggregate (actual time=7685.972..10564.358 rows=329 loops=1)
-> Sort (actual time=7680.426..9423.331 rows=4999067 loops=1)
Sort Key: (date_trunc(’month’::text, visit_date))
Sort Method: external merge Disk: 107496kB
-> Seq Scan on visits (actual time=10.941..2966.460 rows=4999067 loops=1)
20/44
Avoiding sorts
Two levels of HashAggregate could be faster!
SELECT visit_month,
count(*)
FROM (
SELECT DISTINCT
date_trunc(’month’, visit_date)
as visit_month,
visitor_id
FROM visits
) _
GROUP BY visit_month
HashAggregate (actual time=2632.322..2632.354 rows=329 loops=1)
-> HashAggregate (actual time=2496.010..2578.779 rows=329000 loops=1)
-> Seq Scan on visits (actual time=0.060..1569.906 rows=4999067 loops=1)
21/44
Avoiding sorts
How to avoid sorts for array_agg(...ORDER BY ...)?
SELECT
visit_date,
array_agg(visitor_id ORDER BY visitor_id)
FROM visits
GROUP BY visit_date
GroupAggregate (actual time=5433.658..8010.309 rows=10000 loops=1)
-> Sort (actual time=5433.416..6769.872 rows=4999067 loops=1)
Sort Key: visit_date
Sort Method: external merge Disk: 107504kB
-> Seq Scan on visits (actual time=0.046..581.672 rows=4999067 loops=1)
22/44
Avoiding sorts
Might be better to sort each line separately
SELECT
visit_date,
(
select array_agg(i ORDER BY i)
from unnest(visitors_u) i
)
FROM (
SELECT visit_date,
array_agg(visitor_id) visitors_u
FROM visits
GROUP BY visit_date
) _
Subquery Scan on _ (actual time=2504.915..3767.300 rows=10000 loops=1)
-> HashAggregate (actual time=2504.757..2555.038 rows=10000 loops=1)
-> Seq Scan on visits (actual time=0.056..397.859 rows=4999067 loops=1)
SubPlan 1
-> Aggregate (actual time=0.120..0.121 rows=1 loops=10000)
-> Function Scan on unnest i (actual time=0.033..0.055 rows=500 loops=10000)
23/44
Summation
There are three sum functions in PostgreSQL:
sum(int) returns bigint
sum(bigint) returns numeric — SLOW
(needs to convert every input value)
sum(numeric) returns numeric
Do not use bigint as a datatype for a value to be summed,
prefer numeric. BTW small numeric numbers spend less
space bytes on disk than bigint.
It might be worth writing a custom aggregate function
sum(bigint) returns bigint . . .
24/44
Summation
Straightforward solution, to be used if there are few zero values:
SELECT sum(cat_cnt)
FROM cities
Can speed up up to 7 times. Worth considering if >50% zeroes:
SELECT coalesce(sum(tiger_cnt), 0)
FROM cities
WHERE tiger_cnt <> 0
Can help only if the type is numeric and we cannot filter out:
SELECT coalesce(sum(nullif(tiger_cnt, 0)), 0),
sum(cat_cnt)
FROM cities
25/44
Summation
Better in any case to replace all zeroes by nulls:
UPDATE cities
SET cat_cnt = nullif(cat_cnt, 0),
tiger_cnt = nullif(tiger_cnt, 0);
VACUUM FULL cities;
Additionally this will dramatically reduce space occupied.
26/44
Denormalized data aggregation
Sometimes we need to aggregate denormalized data
Most common solution is
SELECT account_id,
account_name,
sum(payment_amount)
FROM payments
GROUP BY account_id,
account_name
Planner does not know that account_id and account_name
correlate. It can lead to wrong estimates and suboptimal plan.
27/44
Denormalized data aggregation
A bit less-known approach is
SELECT account_id,
min(account_name),
sum(payment_amount)
FROM payments
GROUP BY account_id
Works only if the type of "denormalized payload" supports
comparison operator.
28/44
Denormalized data aggregation
Also we can write a custom aggregate function
CREATE FUNCTION frst (text, text)
RETURNS text IMMUTABLE LANGUAGE sql AS
$$ select $1; $$;
CREATE AGGREGATE a (text) (
SFUNC=frst,
STYPE=text
);
SELECT account_id,
a(account_name),
sum(payment_amount)
FROM payments
GROUP BY account_id
29/44
Denormalized data aggregation
Or even write it in C
SELECT account_id,
anyold(account_name),
sum(payment_amount)
FROM payments
GROUP BY account_id
Sorry, no source code for anyold
30/44
Denormalized data aggregation
And what is the fastest?
It depends on the width of "denormalized payload":
1 10 100 1000 10000
dumb 366ms 374ms 459ms 1238ms 53236ms
min 375ms 377ms 409ms 716ms 16747ms
SQL 1970ms 1975ms 2031ms 2446ms 2036ms
C 385ms 385ms 408ms 659ms 436ms
30/44
Denormalized data aggregation
And what is the fastest?
It depends on the width of "denormalized payload":
1 10 100 1000 10000
dumb 366ms 374ms 459ms 1238ms 53236ms
min 375ms 377ms 409ms 716ms 16747ms
SQL 1970ms 1975ms 2031ms 2446ms 2036ms*
C 385ms 385ms 408ms 659ms 436ms*
* — The more data the faster we proceed?
It is because we do not need to extract TOASTed values.
31/44
Arg-maximum
Max
Population of the largest
city in each country
Date of last tweet by each
author
The highest salary in each
department
31/44
Arg-maximum
Max
Population of the largest
city in each country
Date of last tweet by each
author
The highest salary in each
department
Arg-max
What is the largest city in
each country
What is the last tweet by
each author
Who gets the highest
salary in each department
32/44
Arg-maximum
Max is built-in. How to perform Arg-max?
Self-joins?
Window-functions?
32/44
Arg-maximum
Max is built-in. How to perform Arg-max?
Self-joins?
Window-functions?
Use DISTINCT ON() (PG-specific, not in SQL standard)
SELECT DISTINCT ON (author_id)
author_id,
twit_id
FROM twits
ORDER BY author_id,
twit_date DESC
32/44
Arg-maximum
Max is built-in. How to perform Arg-max?
Self-joins?
Window-functions?
Use DISTINCT ON() (PG-specific, not in SQL standard)
SELECT DISTINCT ON (author_id)
author_id,
twit_id
FROM twits
ORDER BY author_id,
twit_date DESC
But it still can be performed only by sorting, not by hashing :(
33/44
Arg-maximum
We can emulate Arg-max by ordinary max and dirty hacks
SELECT author_id,
(max(array[
twit_date,
date’epoch’ + twit_id
]))[2] - date’epoch’
FROM twits
GROUP BY author_id;
But such types tweaking is not always possible.
34/44
Arg-maximum
It’s time to write more custom aggregate functions
CREATE TYPE amax_ty AS (key_date date, payload int);
CREATE FUNCTION amax_t (p_state amax_ty, p_key_date date, p_payload int)
RETURNS amax_ty IMMUTABLE LANGUAGE sql AS
$$
SELECT CASE WHEN p_state.key_date < p_key_date
OR (p_key_date IS NOT NULL AND p_state.key_date IS NULL)
THEN (p_key_date, p_payload)::amax_ty
ELSE p_state END
$$;
CREATE FUNCTION amax_f (p_state amax_ty) RETURNS int IMMUTABLE LANGUAGE sql AS
$$ SELECT p_state.payload $$;
CREATE AGGREGATE amax (date, int) (
SFUNC = amax_t,
STYPE = amax_ty,
FINALFUNC = amax_f,
INITCOND = ’(,)’
);
SELECT author_id,
amax(twit_date, twit_id)
FROM twits
GROUP BY author_id;
35/44
Arg-maximum
Argmax is similar to amax, but written in C
SELECT author_id,
argmax(twit_date, twit_id)
FROM twits
GROUP BY author_id;
36/44
Arg-maximum
Who wins now?
1002 3332 10002 33332 50002
DISTINCT ON 6ms 42ms 342ms 10555ms 30421ms
Max(array) 5ms 47ms 399ms 4464ms 10025ms
SQL amax 38ms 393ms 3541ms 39539ms 90164ms
C argmax 5ms 37ms 288ms 3183ms 7176ms
36/44
Arg-maximum
Who wins now?
1002 3332 10002 33332 50002
DISTINCT ON 6ms 42ms 342ms 10555ms 30421ms
Max(array) 5ms 47ms 399ms 4464ms 10025ms
SQL amax 38ms 393ms 3541ms 39539ms 90164ms
C argmax 5ms 37ms 288ms 3183ms 7176ms
SQL amax finally outperforms DISTINCT ON on 109-ish rows
37/44
Still slow?
38/44
Still slow?
Slow max, arg-max or distinct query?
Sometimes we can fetch the rows one-by-one using index:
3 2 1 4 2 2 1 3 31 0
CREATE INDEX ON twits(author_id, twit_date DESC);
-- for the very first author_id fetch the row with latest date
SELECT twit_id,
twit_date,
author_id
FROM twits
ORDER BY author_id,
twit_date DESC
LIMIT 1;
-- find the next author_id and fetch the row with latest date
SELECT twit_id,
twit_date,
author_id
FROM twits
WHERE author_id > ?
ORDER BY author_id,
twit_date DESC
LIMIT 1;
...
38/44
Still slow?
Slow max, arg-max or distinct query?
Sometimes we can fetch the rows one-by-one using index:
3 2 1 4 2 2 1 3 31 0
CREATE INDEX ON twits(author_id, twit_date DESC);
CREATE FUNCTION f1by1() RETURNS TABLE (o_twit_id int, o_twit_date date) AS $$
DECLARE l_author_id int := -1; -- to make the code a bit more simple
BEGIN
LOOP
SELECT twit_id,
twit_date,
author_id
INTO o_twit_id,
o_twit_date,
l_author_id
FROM twits
WHERE author_id > l_author_id
ORDER BY author_id,
twit_date DESC
LIMIT 1;
EXIT WHEN NOT FOUND;
RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM f1by1();
39/44
Still slow?
Let us use pure SQL instead, it is a bit faster as usual
WITH RECURSIVE d AS (
(
SELECT array[author_id, twit_id] ids
FROM twits
ORDER BY author_id,
twit_date DESC
LIMIT 1
)
UNION
SELECT (
SELECT array[t.author_id, t.twit_id]
FROM twits t
WHERE t.author_id > d.ids[1]
ORDER BY t.author_id,
t.twit_date DESC
LIMIT 1
) q
FROM d
)
SELECT d.ids[1] author_id,
d.ids[2] twit_id
FROM d;
40/44
Still slow?
One-by-one retrieval by index
+ Incredibly fast unless returns too many rows
− Needs an index
− SQL version needs tricks if the data types differ
Authors × Twits-per-author:
106 × 101 105 × 102 104 × 103 102 × 105
C argmax 3679ms 3081ms 2881ms 2859ms
1-by-1 proc 12750ms 1445ms 152ms 2ms
1-by-1 SQL 6250ms 906ms 137ms 2ms
40/44
Still slow?
One-by-one retrieval by index
+ Incredibly fast unless returns too many rows
− Needs an index
− SQL version needs tricks if the data types differ
1002 3332 10002 33332 50002
DISTINCT ON 6ms 42ms 342ms 10555ms 30421ms
Max(array) 5ms 47ms 399ms 4464ms 10025ms
SQL amax 38ms 393ms 3541ms 39539ms 90164ms
C argmax 5ms 37ms 288ms 3183ms 7176ms
1-by-1 proc 2ms 6ms 12ms 42ms 63ms
1-by-1 SQL 1ms 4ms 11ms 29ms 37ms
41/44
Still slow?
Slow HashAggregate?
Use parallel aggregation extension:
http://www.cybertec.at/en/products/
agg-parallel-aggregations-postgresql/
+ Up to 30 times faster
+ Speeds up SeqScan as well
− Mostly useful for complex row operations
− Requires PG 9.5+
− No magic: it loads up several of your cores
42/44
Still slow?
Slow count(DISTINCT ...)?
Use HyperLogLog: reliable and efficient approximate algorithm
https://en.wikipedia.org/wiki/HyperLogLog
https://github.com/aggregateknowledge/postgresql-hll
Or fetch approximate values from pg_stats
43/44
Still slow?
Slow in typing? ;)
SELECT department_id,
avg(salary)
FROM employees
GROUP BY 1 -- same as GROUP BY department_id
SELECT count(*)
FROM employees
GROUP BY true -- same as HAVING count(*) > 0
-- or use MySQL
SELECT account_id,
account_name,
sum(payment_amount)
FROM payments
GROUP BY 1
44/44
Questions?

More Related Content

What's hot

ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOAltinity Ltd
 
patroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymentpatroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymenthyeongchae lee
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsYingjun Wu
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevAltinity Ltd
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOAltinity Ltd
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOAltinity Ltd
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsFederico Razzoli
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouseAltinity Ltd
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
 
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Databricks
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...Altinity Ltd
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesAltinity Ltd
 
InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)I Goo Lee.
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...Altinity Ltd
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
 
あなたの知らないPostgreSQL監視の世界
あなたの知らないPostgreSQL監視の世界あなたの知らないPostgreSQL監視の世界
あなたの知らないPostgreSQL監視の世界Yoshinori Nakanishi
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseAltinity Ltd
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides Altinity Ltd
 

What's hot (20)

ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
 
patroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymentpatroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deployment
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming Systems
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAs
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
 
InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
あなたの知らないPostgreSQL監視の世界
あなたの知らないPostgreSQL監視の世界あなたの知らないPostgreSQL監視の世界
あなたの知らないPostgreSQL監視の世界
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides
 

Viewers also liked

PostgreSql query planning and tuning
PostgreSql query planning and tuningPostgreSql query planning and tuning
PostgreSql query planning and tuningFederico Campoli
 
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLTen Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLanandology
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
Modern SQL in Open Source and Commercial Databases
Modern SQL in Open Source and Commercial DatabasesModern SQL in Open Source and Commercial Databases
Modern SQL in Open Source and Commercial DatabasesMarkus Winand
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
 
PostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) AcidPostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) AcidFederico Campoli
 
Pg chameleon MySQL to PostgreSQL replica
Pg chameleon MySQL to PostgreSQL replicaPg chameleon MySQL to PostgreSQL replica
Pg chameleon MySQL to PostgreSQL replicaFederico Campoli
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1Federico Campoli
 

Viewers also liked (12)

PostgreSql query planning and tuning
PostgreSql query planning and tuningPostgreSql query planning and tuning
PostgreSql query planning and tuning
 
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLTen Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
Modern SQL in Open Source and Commercial Databases
Modern SQL in Open Source and Commercial DatabasesModern SQL in Open Source and Commercial Databases
Modern SQL in Open Source and Commercial Databases
 
Pg big fast ugly acid
Pg big fast ugly acidPg big fast ugly acid
Pg big fast ugly acid
 
Life on a_rollercoaster
Life on a_rollercoasterLife on a_rollercoaster
Life on a_rollercoaster
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in Transwerwise
 
PostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) AcidPostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) Acid
 
Pg chameleon MySQL to PostgreSQL replica
Pg chameleon MySQL to PostgreSQL replicaPg chameleon MySQL to PostgreSQL replica
Pg chameleon MySQL to PostgreSQL replica
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in Transwerwise
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 

Similar to PostgreSQL, performance for queries with grouping

PGDay UK 2016 -- Performace for queries with grouping
PGDay UK 2016 -- Performace for queries with groupingPGDay UK 2016 -- Performace for queries with grouping
PGDay UK 2016 -- Performace for queries with groupingAlexey Bashtanov
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovAltinity Ltd
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLCommand Prompt., Inc
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...
Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...
Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...Ontico
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLCommand Prompt., Inc
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
 
Tech Talk - JPA and Query Optimization - publish
Tech Talk  -  JPA and Query Optimization - publishTech Talk  -  JPA and Query Optimization - publish
Tech Talk - JPA and Query Optimization - publishGleydson Lima
 
Ssis partitioning and best practices
Ssis partitioning and best practicesSsis partitioning and best practices
Ssis partitioning and best practicesVinod Kumar
 
Chapter 8 advanced sorting and hashing for print
Chapter 8 advanced sorting and hashing for printChapter 8 advanced sorting and hashing for print
Chapter 8 advanced sorting and hashing for printAbdii Rashid
 
PostgreSQL query planner's internals
PostgreSQL query planner's internalsPostgreSQL query planner's internals
PostgreSQL query planner's internalsAlexey Ermakov
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query executionAthens Big Data
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineJason Terpko
 

Similar to PostgreSQL, performance for queries with grouping (20)

PGDay UK 2016 -- Performace for queries with grouping
PGDay UK 2016 -- Performace for queries with groupingPGDay UK 2016 -- Performace for queries with grouping
PGDay UK 2016 -- Performace for queries with grouping
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Vertica trace
Vertica traceVertica trace
Vertica trace
 
Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...
Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...
Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Tech Talk - JPA and Query Optimization - publish
Tech Talk  -  JPA and Query Optimization - publishTech Talk  -  JPA and Query Optimization - publish
Tech Talk - JPA and Query Optimization - publish
 
Ssis partitioning and best practices
Ssis partitioning and best practicesSsis partitioning and best practices
Ssis partitioning and best practices
 
Chapter 8 advanced sorting and hashing for print
Chapter 8 advanced sorting and hashing for printChapter 8 advanced sorting and hashing for print
Chapter 8 advanced sorting and hashing for print
 
Results cache
Results cacheResults cache
Results cache
 
PostgreSQL query planner's internals
PostgreSQL query planner's internalsPostgreSQL query planner's internals
PostgreSQL query planner's internals
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
 
The PostgreSQL Query Planner
The PostgreSQL Query PlannerThe PostgreSQL Query Planner
The PostgreSQL Query Planner
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 

PostgreSQL, performance for queries with grouping

  • 1. 1/44 PostgreSQL Optimisation of queries with grouping Alexey Bashtanov, Brandwatch 28 Jan 2016
  • 2. 2/44 What is it all about? This talk will cover optimisation of Grouping Aggregation Unfortunately it will not cover optimisation of Getting the data Filtering Joins Window functions Other data transformations
  • 3. 3/44 Outline 1 What is a grouping? 2 How does it work? Aggregation functions under the hood Grouping algorithms 3 Optimisation Avoiding sorts Summation Denormalized data aggregation Arg-maximum 4 Still slow?
  • 4. 4/44 What is a grouping?
  • 5. 5/44 What is a grouping? What do we call a grouping/aggregation operation? An operation of splitting input data into several classes and then compilation each class into one row. 3 32 21 1 3 3 3 3 1 1 2 2 15 2 2 2 2 3 3 1 1 8 3 3 2 2 3 3 1 1 9
  • 6. 6/44 Examples SELECT department_id, avg(salary) FROM employees GROUP BY department_id SELECT DISTINCT department_id FROM employees
  • 7. 7/44 Examples SELECT DISTINCT ON (department_id) department_id, employee_id, salary FROM employees ORDER BY department_id, salary DESC
  • 8. 8/44 Examples SELECT max(salary) FROM employees SELECT salary FROM employees ORDER BY salary DESC LIMIT 1
  • 10. 10/44 Aggregation functions under the hood INITCOND SFUNC Input data state SFUNC Input data state SFUNC Input data state FINALFUNC Result An aggregate function is defined by: State, input and output types Initial state (INITCOND) Transition function (SFUNC) Final function (FINALFUNC)
  • 11. 10/44 Aggregation functions under the hood state = 0 state += input 2 2 state += input 3 5 state += input 7 12 = sum=12 SELECT sum(column1), avg(column1) FROM (VALUES (2), (3), (7)) _
  • 12. 10/44 Aggregation functions under the hood cnt = 0 sum = 0 cnt++ sum+=input 2 cnt=1 sum=2 cnt++ sum+=input 3 cnt=2 sum=5 cnt++ sum+=input 7 cnt=3 sum=12 sum / cnt avg=4 SELECT sum(column1), avg(column1) FROM (VALUES (2), (3), (7)) _
  • 13. 11/44 Aggregation functions under the hood SFUNC and FINALFUNC functions can be written in C — fast (SFUNC may modify input state and return it) SQL PL/pgSQL — SLOW! any other language SFUNC and FINALFUNC functions can be declared STRICT (i.e. not called on null input)
  • 14. 12/44 Grouping algorithms PostgreSQL uses 2 algorithms to feed aggregate functions by grouped data: GroupAggregate: get the data sorted and apply aggregation function to groups one by one HashAggregate: store state for each key in a hash table
  • 15. 13/44 GroupAgg 1 3 1 2 2 3 1 3 2 1 state: 0
  • 16. 13/44 GroupAgg 1 3 1 2 2 3 1 3 2 1 state: 0 1 3 1 2 2 3 1 3 state: 3
  • 17. 13/44 GroupAgg 1 3 1 2 2 3 1 3 2 1 state: 0 1 3 1 2 2 3 1 3 state: 3 1 3 1 2 2 state: 4 6
  • 18. 13/44 GroupAgg 1 3 1 2 2 3 1 3 2 1 state: 0 1 3 1 2 2 3 1 3 state: 3 1 3 1 2 2 state: 4 6 1 3 1 state: 0 8 6
  • 19. 13/44 GroupAgg 1 3 1 2 2 3 1 3 2 1 state: 0 1 3 1 2 2 3 1 3 state: 3 1 3 1 2 2 state: 4 6 1 3 1 state: 0 8 6 5 8 6
  • 20. 14/44 HashAggregate 1 2 3 2 3 1 2 1 3 1 state: 0
  • 21. 14/44 HashAggregate 1 2 3 2 3 1 2 1 3 1 state: 0 1 2 3 2 3 1 2 1 3 state: 1
  • 22. 14/44 HashAggregate 1 2 3 2 3 1 2 1 3 1 state: 0 1 2 3 2 3 1 2 1 3 state: 1 1 2 3 2 3 1 2 1 state: 1 state: 3
  • 23. 14/44 HashAggregate 1 2 3 2 3 1 2 1 3 1 state: 0 1 2 3 2 3 1 2 1 3 state: 1 1 2 3 2 3 1 2 1 state: 1 state: 3 1 2 3 state: 6 state: 6 state: 1
  • 24. 14/44 HashAggregate 1 2 3 2 3 1 2 1 3 1 state: 0 1 2 3 2 3 1 2 1 3 state: 1 1 2 3 2 3 1 2 1 state: 1 state: 3 1 2 3 state: 6 state: 6 state: 1 state: 6 state: 8 state: 5
  • 25. 14/44 HashAggregate 1 2 3 2 3 1 2 1 3 1 state: 0 1 2 3 2 3 1 2 1 3 state: 1 1 2 3 2 3 1 2 1 state: 1 state: 3 1 2 3 state: 6 state: 6 state: 1 state: 6 state: 8 state: 5 68 5
  • 26. 15/44 GroupAggregate vs. HashAggregate GroupAggregate − Requires sorted data + Needs less memory + Returns sorted data + Returns data on the fly + Can perform count(distinct ...), array_agg(... order by ...) etc. HashAggregate + Accepts unsorted data − Needs more memory − Returns unsorted data − Returns data at the end − Can perform only basic aggregation
  • 28. 17/44 Avoiding sorts Sorts are really slow. Prefer HashAggregation if possible.
  • 29. 17/44 Avoiding sorts Sorts are really slow. Prefer HashAggregation if possible. What to do if you get something like this? EXPLAIN SELECT region_id, avg(age) FROM people GROUP BY region_id GroupAggregate (cost=149244.84..156869.46 rows=9969 width=10) -> Sort (cost=149244.84..151744.84 rows=1000000 width=10) Sort Key: region_id -> Seq Scan on people (cost=0.00..15406.00 rows=1000000 width=10) 1504.474 ms
  • 30. 17/44 Avoiding sorts Sorts are really slow. Prefer HashAggregation if possible. What to do if you get something like this? EXPLAIN SELECT region_id, avg(age) FROM people GROUP BY region_id set enable_sort to off?
  • 31. 17/44 Avoiding sorts Sorts are really slow. Prefer HashAggregation if possible. What to do if you get something like this? EXPLAIN SELECT region_id, avg(age) FROM people GROUP BY region_id set enable_sort to off? No! GroupAggregate (cost=10000149244.84..10000156869.46 rows=9969 width=10) -> Sort (cost=10000149244.84..10000151744.84 rows=1000000 width=10) Sort Key: region_id -> Seq Scan on people (cost=0.00..15406.00 rows=1000000 width=10) 1497.167 ms
  • 32. 17/44 Avoiding sorts Sorts are really slow. Prefer HashAggregation if possible. What to do if you get something like this? EXPLAIN SELECT region_id, avg(age) FROM people GROUP BY region_id Increase work_mem: set work_mem to ’100MB’ HashAggregate (cost=20406.00..20530.61 rows=9969 width=10) -> Seq Scan on people (cost=0.00..15406.00 rows=1000000 width=10) 685.689 ms
  • 33. 17/44 Avoiding sorts Sorts are really slow. Prefer HashAggregation if possible. What to do if you get something like this? EXPLAIN SELECT region_id, avg(age) FROM people GROUP BY region_id Increase work_mem: set work_mem to ’100MB’ HashAggregate (cost=20406.00..20530.61 rows=9969 width=10) -> Seq Scan on people (cost=0.00..15406.00 rows=1000000 width=10) 685.689 ms Increase sanely to avoid OOM
  • 34. 18/44 Avoiding sorts How to spend less memory to allow HashAggregation? Don’t aggregate joined SELECT p.region_id, d.region_description, avg(age) FROM people p JOIN regions r using (region_id) GROUP BY region_id, region_description Join aggregated instead SELECT a.region_id, r.region_description, a.avg_age FROM ( SELECT region_id, avg(age) avg_age FROM people p GROUP BY region_id ) a JOIN regions r using (region_id)
  • 35. 19/44 Avoiding sorts How to avoid sorts for count(DISTINCT ...)? SELECT date_trunc(’month’, visit_date), count(DISTINCT visitor_id) FROM visits GROUP BY date_trunc(’month’, visit_date) GroupAggregate (actual time=7685.972..10564.358 rows=329 loops=1) -> Sort (actual time=7680.426..9423.331 rows=4999067 loops=1) Sort Key: (date_trunc(’month’::text, visit_date)) Sort Method: external merge Disk: 107496kB -> Seq Scan on visits (actual time=10.941..2966.460 rows=4999067 loops=1)
  • 36. 20/44 Avoiding sorts Two levels of HashAggregate could be faster! SELECT visit_month, count(*) FROM ( SELECT DISTINCT date_trunc(’month’, visit_date) as visit_month, visitor_id FROM visits ) _ GROUP BY visit_month HashAggregate (actual time=2632.322..2632.354 rows=329 loops=1) -> HashAggregate (actual time=2496.010..2578.779 rows=329000 loops=1) -> Seq Scan on visits (actual time=0.060..1569.906 rows=4999067 loops=1)
  • 37. 21/44 Avoiding sorts How to avoid sorts for array_agg(...ORDER BY ...)? SELECT visit_date, array_agg(visitor_id ORDER BY visitor_id) FROM visits GROUP BY visit_date GroupAggregate (actual time=5433.658..8010.309 rows=10000 loops=1) -> Sort (actual time=5433.416..6769.872 rows=4999067 loops=1) Sort Key: visit_date Sort Method: external merge Disk: 107504kB -> Seq Scan on visits (actual time=0.046..581.672 rows=4999067 loops=1)
  • 38. 22/44 Avoiding sorts Might be better to sort each line separately SELECT visit_date, ( select array_agg(i ORDER BY i) from unnest(visitors_u) i ) FROM ( SELECT visit_date, array_agg(visitor_id) visitors_u FROM visits GROUP BY visit_date ) _ Subquery Scan on _ (actual time=2504.915..3767.300 rows=10000 loops=1) -> HashAggregate (actual time=2504.757..2555.038 rows=10000 loops=1) -> Seq Scan on visits (actual time=0.056..397.859 rows=4999067 loops=1) SubPlan 1 -> Aggregate (actual time=0.120..0.121 rows=1 loops=10000) -> Function Scan on unnest i (actual time=0.033..0.055 rows=500 loops=10000)
  • 39. 23/44 Summation There are three sum functions in PostgreSQL: sum(int) returns bigint sum(bigint) returns numeric — SLOW (needs to convert every input value) sum(numeric) returns numeric Do not use bigint as a datatype for a value to be summed, prefer numeric. BTW small numeric numbers spend less space bytes on disk than bigint. It might be worth writing a custom aggregate function sum(bigint) returns bigint . . .
  • 40. 24/44 Summation Straightforward solution, to be used if there are few zero values: SELECT sum(cat_cnt) FROM cities Can speed up up to 7 times. Worth considering if >50% zeroes: SELECT coalesce(sum(tiger_cnt), 0) FROM cities WHERE tiger_cnt <> 0 Can help only if the type is numeric and we cannot filter out: SELECT coalesce(sum(nullif(tiger_cnt, 0)), 0), sum(cat_cnt) FROM cities
  • 41. 25/44 Summation Better in any case to replace all zeroes by nulls: UPDATE cities SET cat_cnt = nullif(cat_cnt, 0), tiger_cnt = nullif(tiger_cnt, 0); VACUUM FULL cities; Additionally this will dramatically reduce space occupied.
  • 42. 26/44 Denormalized data aggregation Sometimes we need to aggregate denormalized data Most common solution is SELECT account_id, account_name, sum(payment_amount) FROM payments GROUP BY account_id, account_name Planner does not know that account_id and account_name correlate. It can lead to wrong estimates and suboptimal plan.
  • 43. 27/44 Denormalized data aggregation A bit less-known approach is SELECT account_id, min(account_name), sum(payment_amount) FROM payments GROUP BY account_id Works only if the type of "denormalized payload" supports comparison operator.
  • 44. 28/44 Denormalized data aggregation Also we can write a custom aggregate function CREATE FUNCTION frst (text, text) RETURNS text IMMUTABLE LANGUAGE sql AS $$ select $1; $$; CREATE AGGREGATE a (text) ( SFUNC=frst, STYPE=text ); SELECT account_id, a(account_name), sum(payment_amount) FROM payments GROUP BY account_id
  • 45. 29/44 Denormalized data aggregation Or even write it in C SELECT account_id, anyold(account_name), sum(payment_amount) FROM payments GROUP BY account_id Sorry, no source code for anyold
  • 46. 30/44 Denormalized data aggregation And what is the fastest? It depends on the width of "denormalized payload": 1 10 100 1000 10000 dumb 366ms 374ms 459ms 1238ms 53236ms min 375ms 377ms 409ms 716ms 16747ms SQL 1970ms 1975ms 2031ms 2446ms 2036ms C 385ms 385ms 408ms 659ms 436ms
  • 47. 30/44 Denormalized data aggregation And what is the fastest? It depends on the width of "denormalized payload": 1 10 100 1000 10000 dumb 366ms 374ms 459ms 1238ms 53236ms min 375ms 377ms 409ms 716ms 16747ms SQL 1970ms 1975ms 2031ms 2446ms 2036ms* C 385ms 385ms 408ms 659ms 436ms* * — The more data the faster we proceed? It is because we do not need to extract TOASTed values.
  • 48. 31/44 Arg-maximum Max Population of the largest city in each country Date of last tweet by each author The highest salary in each department
  • 49. 31/44 Arg-maximum Max Population of the largest city in each country Date of last tweet by each author The highest salary in each department Arg-max What is the largest city in each country What is the last tweet by each author Who gets the highest salary in each department
  • 50. 32/44 Arg-maximum Max is built-in. How to perform Arg-max? Self-joins? Window-functions?
  • 51. 32/44 Arg-maximum Max is built-in. How to perform Arg-max? Self-joins? Window-functions? Use DISTINCT ON() (PG-specific, not in SQL standard) SELECT DISTINCT ON (author_id) author_id, twit_id FROM twits ORDER BY author_id, twit_date DESC
  • 52. 32/44 Arg-maximum Max is built-in. How to perform Arg-max? Self-joins? Window-functions? Use DISTINCT ON() (PG-specific, not in SQL standard) SELECT DISTINCT ON (author_id) author_id, twit_id FROM twits ORDER BY author_id, twit_date DESC But it still can be performed only by sorting, not by hashing :(
  • 53. 33/44 Arg-maximum We can emulate Arg-max by ordinary max and dirty hacks SELECT author_id, (max(array[ twit_date, date’epoch’ + twit_id ]))[2] - date’epoch’ FROM twits GROUP BY author_id; But such types tweaking is not always possible.
  • 54. 34/44 Arg-maximum It’s time to write more custom aggregate functions CREATE TYPE amax_ty AS (key_date date, payload int); CREATE FUNCTION amax_t (p_state amax_ty, p_key_date date, p_payload int) RETURNS amax_ty IMMUTABLE LANGUAGE sql AS $$ SELECT CASE WHEN p_state.key_date < p_key_date OR (p_key_date IS NOT NULL AND p_state.key_date IS NULL) THEN (p_key_date, p_payload)::amax_ty ELSE p_state END $$; CREATE FUNCTION amax_f (p_state amax_ty) RETURNS int IMMUTABLE LANGUAGE sql AS $$ SELECT p_state.payload $$; CREATE AGGREGATE amax (date, int) ( SFUNC = amax_t, STYPE = amax_ty, FINALFUNC = amax_f, INITCOND = ’(,)’ ); SELECT author_id, amax(twit_date, twit_id) FROM twits GROUP BY author_id;
  • 55. 35/44 Arg-maximum Argmax is similar to amax, but written in C SELECT author_id, argmax(twit_date, twit_id) FROM twits GROUP BY author_id;
  • 56. 36/44 Arg-maximum Who wins now? 1002 3332 10002 33332 50002 DISTINCT ON 6ms 42ms 342ms 10555ms 30421ms Max(array) 5ms 47ms 399ms 4464ms 10025ms SQL amax 38ms 393ms 3541ms 39539ms 90164ms C argmax 5ms 37ms 288ms 3183ms 7176ms
  • 57. 36/44 Arg-maximum Who wins now? 1002 3332 10002 33332 50002 DISTINCT ON 6ms 42ms 342ms 10555ms 30421ms Max(array) 5ms 47ms 399ms 4464ms 10025ms SQL amax 38ms 393ms 3541ms 39539ms 90164ms C argmax 5ms 37ms 288ms 3183ms 7176ms SQL amax finally outperforms DISTINCT ON on 109-ish rows
  • 59. 38/44 Still slow? Slow max, arg-max or distinct query? Sometimes we can fetch the rows one-by-one using index: 3 2 1 4 2 2 1 3 31 0 CREATE INDEX ON twits(author_id, twit_date DESC); -- for the very first author_id fetch the row with latest date SELECT twit_id, twit_date, author_id FROM twits ORDER BY author_id, twit_date DESC LIMIT 1; -- find the next author_id and fetch the row with latest date SELECT twit_id, twit_date, author_id FROM twits WHERE author_id > ? ORDER BY author_id, twit_date DESC LIMIT 1; ...
  • 60. 38/44 Still slow? Slow max, arg-max or distinct query? Sometimes we can fetch the rows one-by-one using index: 3 2 1 4 2 2 1 3 31 0 CREATE INDEX ON twits(author_id, twit_date DESC); CREATE FUNCTION f1by1() RETURNS TABLE (o_twit_id int, o_twit_date date) AS $$ DECLARE l_author_id int := -1; -- to make the code a bit more simple BEGIN LOOP SELECT twit_id, twit_date, author_id INTO o_twit_id, o_twit_date, l_author_id FROM twits WHERE author_id > l_author_id ORDER BY author_id, twit_date DESC LIMIT 1; EXIT WHEN NOT FOUND; RETURN NEXT; END LOOP; END; $$ LANGUAGE plpgsql; SELECT * FROM f1by1();
  • 61. 39/44 Still slow? Let us use pure SQL instead, it is a bit faster as usual WITH RECURSIVE d AS ( ( SELECT array[author_id, twit_id] ids FROM twits ORDER BY author_id, twit_date DESC LIMIT 1 ) UNION SELECT ( SELECT array[t.author_id, t.twit_id] FROM twits t WHERE t.author_id > d.ids[1] ORDER BY t.author_id, t.twit_date DESC LIMIT 1 ) q FROM d ) SELECT d.ids[1] author_id, d.ids[2] twit_id FROM d;
  • 62. 40/44 Still slow? One-by-one retrieval by index + Incredibly fast unless returns too many rows − Needs an index − SQL version needs tricks if the data types differ Authors × Twits-per-author: 106 × 101 105 × 102 104 × 103 102 × 105 C argmax 3679ms 3081ms 2881ms 2859ms 1-by-1 proc 12750ms 1445ms 152ms 2ms 1-by-1 SQL 6250ms 906ms 137ms 2ms
  • 63. 40/44 Still slow? One-by-one retrieval by index + Incredibly fast unless returns too many rows − Needs an index − SQL version needs tricks if the data types differ 1002 3332 10002 33332 50002 DISTINCT ON 6ms 42ms 342ms 10555ms 30421ms Max(array) 5ms 47ms 399ms 4464ms 10025ms SQL amax 38ms 393ms 3541ms 39539ms 90164ms C argmax 5ms 37ms 288ms 3183ms 7176ms 1-by-1 proc 2ms 6ms 12ms 42ms 63ms 1-by-1 SQL 1ms 4ms 11ms 29ms 37ms
  • 64. 41/44 Still slow? Slow HashAggregate? Use parallel aggregation extension: http://www.cybertec.at/en/products/ agg-parallel-aggregations-postgresql/ + Up to 30 times faster + Speeds up SeqScan as well − Mostly useful for complex row operations − Requires PG 9.5+ − No magic: it loads up several of your cores
  • 65. 42/44 Still slow? Slow count(DISTINCT ...)? Use HyperLogLog: reliable and efficient approximate algorithm https://en.wikipedia.org/wiki/HyperLogLog https://github.com/aggregateknowledge/postgresql-hll Or fetch approximate values from pg_stats
  • 66. 43/44 Still slow? Slow in typing? ;) SELECT department_id, avg(salary) FROM employees GROUP BY 1 -- same as GROUP BY department_id SELECT count(*) FROM employees GROUP BY true -- same as HAVING count(*) > 0 -- or use MySQL SELECT account_id, account_name, sum(payment_amount) FROM payments GROUP BY 1