Grouping sets sfpug_20141118

GROUPING SETS
CCUBE, ROLLUP, and Friends
SFPUG 2014/11/18
Copyright© 2014
David Fetter
Tuesday, November 18, 14

Thanks,

Why?!?

Analyzing

Reporting

• CUBE (Power set/Ring the changes)
• ROLLUP (Hierarchy)
• GROUPING SETS (Precision)

Shhh. A little code.

CREATE TABLE employee (
id SERIAL PRIMARY KEY,
first_name TEXT,
last_name TEXT
);
CREATE TABLE sales (
employee_id INTEGER NOT NULL,
sale_closed TIMESTAMPTZ NOT NULL DEFAULT NOW(),
sale_amount MONEY, /* We need to do fix this */
FOREIGN KEY(employee_id) REFERENCES employee(id)
);
Tables

Data
INSERT INTO employee (first_name, last_name)
VALUES ('Larry', 'Ellison'),
('Bill', 'Gates'),
('Vladimir', 'Yulianov');

Moar Data
INSERT INTO sales
SELECT
floor(random()*3)+1, /* Who */
'2014-01-01 00:00:00+00'::timestamptz +
random() * interval '1 year', /* When */
(random() * 1000)::numeric(8,2)::MONEY /* ¿Cuando? */
FROM generate_series(1,1000);

How much did each sell each quarter?

SIMPLE!

SELECT
employee_id,
date_trunc('Quarter', sale_closed) AS "Quarter",
SUM(sale_amount)
FROM sales
GROUP BY
employee_id,
date_trunc('Quarter', sale_closed)
ORDER BY
employee_id,
date_trunc('Quarter', sale_closed);
* I left out some formatting.

Results:
!"""""""""""""#"""""""""#""""""""""""$
% employee_id % Quarter % sum %
&"""""""""""""'"""""""""'""""""""""""(
% 1 % 2014-Q1 % $42,227.43 %
% 1 % 2014-Q2 % $42,974.71 %
% 1 % 2014-Q3 % $41,364.66 %
% 1 % 2014-Q4 % $33,910.34 %
% 2 % 2014-Q1 % $38,733.24 %
% 2 % 2014-Q2 % $40,480.96 %
% 2 % 2014-Q3 % $43,875.72 %
% 2 % 2014-Q4 % $42,041.28 %
% 3 % 2014-Q1 % $45,351.14 %
% 3 % 2014-Q2 % $36,672.08 %
% 3 % 2014-Q3 % $33,468.46 %
% 3 % 2014-Q4 % $42,793.36 %
)"""""""""""""*"""""""""*""""""""""""+
(12 rows)

That's nice, BUT
(We all grimace when we hear that)

How about annual totals?

Old way:
UNION ALL

(
SELECT employee_id, to_char(date_trunc('Quarter', sale_closed),
'YYYY-"Q"Q') AS "Quarter", sum(sale_amount)
FROM sales
GROUP BY employee_id, date_trunc('Quarter', sale_closed)
ORDER BY employee_id, date_trunc('Quarter', sale_closed)
)
UNION ALL
(
SELECT employee_id, to_char(date_trunc('Year', sale_closed),
'YYYY') AS "Year", sum(sale_amount)
FROM sales
GROUP BY employee_id, date_trunc('Year', sale_closed)
ORDER BY employee_id, date_trunc('Year', sale_closed)
);
Still Doable...Kinda

Results
!"""""""""""""#"""""""""#"""""""""""""$
&"""""""""""""'"""""""""'"""""""""""""(
% 1 % 2014-Q1 % $42,227.43 %
% 1 % 2014-Q2 % $42,974.71 %
% 1 % 2014-Q3 % $41,364.66 %
% 1 % 2014-Q4 % $33,910.34 %
% 2 % 2014-Q1 % $38,733.24 %
% 2 % 2014-Q2 % $40,480.96 %
% 2 % 2014-Q3 % $43,875.72 %
% 2 % 2014-Q4 % $42,041.28 %
% 3 % 2014-Q1 % $45,351.14 %
% 3 % 2014-Q2 % $36,672.08 %
% 3 % 2014-Q3 % $33,468.46 %
% 3 % 2014-Q4 % $42,793.36 %
% 1 % 2014 % $160,477.14 %
% 2 % 2014 % $165,131.20 %
% 3 % 2014 % $158,285.04 %
)"""""""""""""*"""""""""*"""""""""""""+
(15 rows)

That's nice, BUT

Can't we look at each sales rep
with each of their quarterly
totals?

ARGHH!!!!!!

These requests are reasonable!

But the code...not so much.

Take it from the top!

CUBE...ring the changes...

Quick stare
SELECT
employee_id,
to_char(
date_trunc('Quarter', sale_closed),
'YYYY-"Q"Q'
) AS "Quarter",
sum(sale_amount)
FROM sales
GROUP BY CUBE (
employee_id,
)
ORDER BY employee_id, date_trunc('Quarter', sale_closed);

Results:
!"""""""""""""#"""""""""#"""""""""""""$
&"""""""""""""'"""""""""'"""""""""""""(
% 1 % 2014-Q1 % $42,227.43 %
% 1 % 2014-Q2 % $42,974.71 %
% 1 % 2014-Q3 % $41,364.66 %
% 1 % 2014-Q4 % $33,910.34 %
% 1 % % $160,477.14 %
% 2 % 2014-Q1 % $38,733.24 %
% 2 % 2014-Q2 % $40,480.96 %
% 2 % 2014-Q3 % $43,875.72 %
% 2 % 2014-Q4 % $42,041.28 %
% 2 % % $165,131.20 %
% 3 % 2014-Q1 % $45,351.14 %
% 3 % 2014-Q2 % $36,672.08 %
% 3 % 2014-Q3 % $33,468.46 %
% 3 % 2014-Q4 % $42,793.36 %
% 3 % % $158,285.04 %
% % 2014-Q1 % $126,311.81 %
% % 2014-Q2 % $120,127.75 %
% % 2014-Q3 % $118,708.84 %
% % 2014-Q4 % $118,744.98 %
% % % $483,893.38 %
)"""""""""""""*"""""""""*"""""""""""""+
(20 rows)

We don't care
about undifferentiated
quarterly totals.

ROLLUP...hierarchy...

Let's try that!

SELECT
employee_id,
to_char(
'YYYY-"Q"Q'
) AS "Quarter",
sum(sale_amount)
FROM sales
GROUP BY ROLLUP(
employee_id,
)
ORDER BY
employee_id,
date_trunc('Quarter', sale_closed);

Hmmm...
!"""""""""""""#"""""""""#"""""""""""""$
&"""""""""""""'"""""""""'"""""""""""""(
% 1 % 2014-Q1 % $42,227.43 %
% 1 % 2014-Q2 % $42,974.71 %
% 1 % 2014-Q3 % $41,364.66 %
% 1 % 2014-Q4 % $33,910.34 %
% 1 % % $160,477.14 %
% 2 % 2014-Q1 % $38,733.24 %
% 2 % 2014-Q2 % $40,480.96 %
% 2 % 2014-Q3 % $43,875.72 %
% 2 % 2014-Q4 % $42,041.28 %
% 2 % % $165,131.20 %
% 3 % 2014-Q1 % $45,351.14 %
% 3 % 2014-Q2 % $36,672.08 %
% 3 % 2014-Q3 % $33,468.46 %
% 3 % 2014-Q4 % $42,793.36 %
% 3 % % $158,285.04 %
% % % $483,893.38 %
)"""""""""""""*"""""""""*"""""""""""""+
(16 rows)

There was an extra line.

!"""""""""""""#"""""""""#"""""""""""""$
&"""""""""""""'"""""""""'"""""""""""""(
% 1 % 2014-Q1 % $42,227.43 %
% 1 % 2014-Q2 % $42,974.71 %
% 1 % 2014-Q3 % $41,364.66 %
% 1 % 2014-Q4 % $33,910.34 %
% 1 % % $160,477.14 %
% 2 % 2014-Q1 % $38,733.24 %
% 2 % 2014-Q2 % $40,480.96 %
% 2 % 2014-Q3 % $43,875.72 %
% 2 % 2014-Q4 % $42,041.28 %
% 2 % % $165,131.20 %
% 3 % 2014-Q1 % $45,351.14 %
% 3 % 2014-Q2 % $36,672.08 %
% 3 % 2014-Q3 % $33,468.46 %
% 3 % 2014-Q4 % $42,793.36 %
% 3 % % $158,285.04 %
% % % $483,893.38 %
)"""""""""""""*"""""""""*"""""""""""""+
(16 rows)

Hierarchies:
Top to Bottom

We didn't want the top.

GROUPING SETS...
Precision

SELECT
employee_id,
to_char(
'YYYY-"Q"Q'
) AS "Quarter",
sum(sale_amount)
FROM sales
GROUP BY GROUPING SETS(
(employee_id, date_trunc('Quarter', sale_closed)),
(employee_id)
)
ORDER BY employee_id, date_trunc('Quarter', sale_closed);

Results:
!"""""""""""""#"""""""""#"""""""""""""$
&"""""""""""""'"""""""""'"""""""""""""(
% 1 % 2014-Q1 % $42,227.43 %
% 1 % 2014-Q2 % $42,974.71 %
% 1 % 2014-Q3 % $41,364.66 %
% 1 % 2014-Q4 % $33,910.34 %
% 1 % % $160,477.14 %
% 2 % 2014-Q1 % $38,733.24 %
% 2 % 2014-Q2 % $40,480.96 %
% 2 % 2014-Q3 % $43,875.72 %
% 2 % 2014-Q4 % $42,041.28 %
% 2 % % $165,131.20 %
% 3 % 2014-Q1 % $45,351.14 %
% 3 % 2014-Q2 % $36,672.08 %
% 3 % 2014-Q3 % $33,468.46 %
% 3 % 2014-Q4 % $42,793.36 %
% 3 % % $158,285.04 %
)"""""""""""""*"""""""""*"""""""""""""+
(15 rows)

There we go!

HOW?!?

Extant Planner/Executor

•HashAgg

•HashAgg
•GroupAgg

HashAgg
Result Group Intermediate State

HashAgg
• One pass:
• Update hash value for each row
• Output final value at the end

HashAgg
• Not yet in GROUPING SETS
• Algorithmic speedup opportunity:
• O(n) vs. O(n log n)

HashAgg-- :-(
• Non-hashable data types
• Aggregate functions with LOTS of state
• Ordered aggs
• Distinct aggs
• No spill-to-disk

GroupAgg
• Sorts all input to the agg node to
• Detect group boundary
• Output that group
• Results before end-of-scan

Phase I

GroupAgg for ROLLUP

GroupAgg for ROLLUP
• Sort for the heirarchy

GroupAgg for ROLLUP
• Output results at each boundary

GroupAgg for ROLLUP
• Output results at each boundary
• k for the price of one!

Phase II

GroupAgg !ROLLUP

GroupAgg !ROLLUP
• Re-plan input to sort with >1 order

GroupAgg !ROLLUP
• Plan keeps tons of global state

GroupAgg !ROLLUP
• Plan keeps tons of global state
• Does NOT like to be called >1x/plan

GROUPING SETS ~
WINDOW

WINDOW
implementation

Shuffle a deck of WindowAgg and Sort nodes.

WindowAgg → Sort → WindowAgg → Sort ...

Similar pattern

• Expand all GROUPING SETS

• Arrange into fewest ROLLUPs

• Arrange into fewest ROLLUPs
• Shuffle Sort and ChainAgg

GroupAgg →
Sort →
ChainAgg →
Sort →
(input data)

ChainAgg?!?

ChainAgg Nodes
• Pass input state through unchanged
• Update aggregate state
• Put rows into a chain-wide shared
tuplestore when they hit a group boundary

The Last GroupAgg
• Produces its normal output until end-of-data
• Outputs the shared tuplestore

Phase III

Future

• HashAgg
• Alone?
• With ChainAggs?
• Agg Associativity (A + B) + C = A + (B + C)
• Make CUBE a reserved word?

Questions?
Comments?

Grouping sets sfpug_20141118

Recommended

Recommended

More Related Content

More from David Fetter

More from David Fetter (15)

Recently uploaded

Recently uploaded (20)

Grouping sets sfpug_20141118