Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Window functions in MySQL 8
Dag H. Wanvik
Senior database engineer
MySQL optimizer team
Sep. 2017
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product
direction. It is intended for information purposes only, and
may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing
decisions. The development, release, and timing of any
features or functionality described for Oracle’s products
remains at the sole discretion of Oracle.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
•
Introduction: what & why
•
What's supported
•
Ranking and analytical wfs
•
Implementation & performance
1
2
3
44
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
PART I
Gentle intro in which we meet the SUM aggregate used as a
window function and get introduced to window partitions and
window frames
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Why window functions?
● Part of SQL standard since 2003, with later additions
● Frequently requested feature(s) for data analysis
● Improves readability and often performance
● Most vendors support it, but to varying degrees (YMMV)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Why window functions?
SELECT name o_name, department_id,
salary AS o_salary,
(SELECT SUM(salary) AS sum
FROM employee
WHERE salary <= o_salary AND NOT
(salary = o_salary AND
o_name > name)) AS sum
FROM employee
ORDER BY sum, name;
SELECT name, department_id, salary,
SUM(salary) OVER w AS sum
FROM employee
WINDOW w AS (ORDER BY salary, name
ROWS UNBOUNDED PRECEDING)
ORDER BY sum, name;
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Dag | 10 | NULL | NULL |
| Frederik | 10 | 60000 | 60000 |
| Jon | 10 | 60000 | 120000 |
| Lena | 20 | 65000 | 185000 |
| Paula | 20 | 65000 | 250000 |
| Michael | 10 | 70000 | 320000 |
| William | 30 | 70000 | 390000 |
| Nils | NULL | 75000 | 465000 |
| Nils | 20 | 80000 | 545000 |
| Erik | 10 | 100000 | 645000 |
| Rose | 30 | 300000 | 945000 |
+----------+---------------+--------+--------+
● Readability
● Performance my laptop:
50,000 rows: 16m vs 0.14s
● Or use self join, but tricky
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
What's a SQL window function?
Short answer: a function that gets its arguments from a set of rows; a
window defined by a partition and a frame.
OK, but
● what is partitioned data?
● what is a frame?
● what does a window function look like? Hint: OVER keyword
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Running ex: employees
+----------+---------------+--------+
| name | department_id | salary |
+----------+---------------+--------+
| Nils | NULL | 75000 |
| Dag | 10 | NULL |
| Erik | 10 | 100000 |
| Frederik | 10 | 60000 |
| Jon | 10 | 60000 |
| Michael | 10 | 70000 |
| Lena | 20 | 65000 |
| Nils | 20 | 80000 |
| Paula | 20 | 65000 |
| Rose | 30 | 300000 |
| William | 30 | 70000 |
+----------+---------------+--------+
SELECT name, department_id, salary
FROM employee
ORDER BY department_id, name;
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Salaries per dept
+----------+---------------+--------+
| name | department_id | salary |
+----------+---------------+--------+
| Nils | NULL | 75000 |
| Dag | 10 | NULL |
| Erik | 10 | 100000 |
| Frederik | 10 | 60000 |
| Jon | 10 | 60000 |
| Michael | 10 | 70000 |
| Lena | 20 | 65000 |
| Nils | 20 | 80000 |
| Paula | 20 | 65000 |
| Rose | 30 | 300000 |
| William | 30 | 70000 |
+----------+---------------+--------+
+---------------+-------------+
| department_id | SUM(salary) |
+---------------+-------------+
| NULL | 75000 |
| 10 | 290000 |
| 20 | 210000 |
| 30 | 370000 |
+---------------+-------------+
SELECT department_id, SUM(salary)
FROM employee GROUP BY department_id
ORDER BY department_id;
Query: find sums of salaries per department
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Grouping loss
SELECT department_id, SUM(salary)
FROM employee GROUP BY department_id;
Identity of names and
salaries lost.
Lena
Nils
Paula ∑
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Windowing
+----------+---------------+--------+
| name | department_id | salary |
+----------+---------------+--------+
| Nils | NULL | 75000 |
| Dag | 10 | NULL |
| Erik | 10 | 100000 |
| Frederik | 10 | 60000 |
| Jon | 10 | 60000 |
| Michael | 10 | 70000 |
| Lena | 20 | 65000 |
| Nils | 20 | 80000 |
| Paula | 20 | 65000 |
| Rose | 30 | 300000 |
| William | 30 | 70000 |
+----------+---------------+--------+
SELECT name, department_id, salary
FROM employee
ORDER BY department_id, name;
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Windowing, rows kept
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Nils | NULL | 75000 | 945000 |
| Dag | 10 | NULL | 945000 |
| Erik | 10 | 100000 | 945000 |
| Frederik | 10 | 60000 | 945000 |
| Jon | 10 | 60000 | 945000 |
| Michael | 10 | 70000 | 945000 |
| Lena | 20 | 65000 | 945000 |
| Nils | 20 | 80000 | 945000 |
| Paula | 20 | 65000 | 945000 |
| Rose | 30 | 300000 | 945000 |
| William | 30 | 70000 | 945000 |
+----------+---------------+--------+--------+
SELECT name, department_id, salary,
SUM(salary) OVER () sum
FROM employee
ORDER BY department_id, name;
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Windowing, «grouped»
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Nils | NULL | 75000 | 75000 |
| Dag | 10 | NULL | 290000 |
| Erik | 10 | 100000 | 290000 |
| Frederik | 10 | 60000 | 290000 |
| Jon | 10 | 60000 | 290000 |
| Michael | 10 | 70000 | 290000 |
| Lena | 20 | 65000 | 210000 |
| Nils | 20 | 80000 | 210000 |
| Paula | 20 | 65000 | 210000 |
| Rose | 30 | 300000 | 370000 |
| William | 30 | 70000 | 370000 |
+----------+---------------+--------+--------+
SELECT name, department_id, salary,
SUM(salary) OVER (PARTITION BY department_id) sum
FROM employee
ORDER BY department_id, name;
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Partition == frame
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Nils | NULL | 75000 | 75000 |
| Dag | 10 | NULL | 290000 |
| Erik | 10 | 100000 | 290000 |
| Frederik | 10 | 60000 | 290000 |
| Jon | 10 | 60000 | 290000 |
| Michael | 10 | 70000 | 290000 |
| Lena | 20 | 65000 | 210000 |
| Nils | 20 | 80000 | 210000 |
| Paula | 20 | 65000 | 210000 |
| Rose | 30 | 300000 | 370000 |
| William | 30 | 70000 | 370000 |
+----------+---------------+--------+--------+
SELECT name, department_id, salary,
SUM(salary) OVER (PARTITION BY department_id) sum
FROM employee
ORDER BY department_id, name;
All salaries in partition
added: the window frame is
the entire partition
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
SELECT name, salary, department_id,
SUM(salary) OVER (PARTITION BY department_id) sum
FROM employee
ORDER BY department_id;
∑
∑
∑Identity of department names and
salaries kept: no rows are lost
=> A window function is similar to a scalar
function: adds a result column
=> BUT: can read data from other rows
than its own: within its WINDOW partition
or frame
Lena
Nils
Paula
Windowing, rows kept
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Default partition
SELECT name, department_id, salary,
SUM(salary) OVER () sum
FROM employee
ORDER BY department_id, name;
No partition specified: the
window frame is the entire
result set
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Nils | NULL | 75000 | 945000 |
| Dag | 10 | NULL | 945000 |
| Erik | 10 | 100000 | 945000 |
| Frederik | 10 | 60000 | 945000 |
| Jon | 10 | 60000 | 945000 |
| Michael | 10 | 70000 | 945000 |
| Lena | 20 | 65000 | 945000 |
| Nils | 20 | 80000 | 945000 |
| Paula | 20 | 65000 | 945000 |
| Rose | 30 | 300000 | 945000 |
| William | 30 | 70000 | 945000 |
+----------+---------------+--------+--------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
SELECT name, department_id, salary,
SUM(salary) OVER (ORDER BY department_id, name
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) sum
FROM employee
ORDER BY department_id, name;
No partition specified: the
window frame grows
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Nils | NULL | 75000 | 75000 |
| Dag | 10 | NULL | 75000 |
| Erik | 10 | 100000 | 175000 |
| Frederik | 10 | 60000 | 235000 |
| Jon | 10 | 60000 | 295000 |
| Michael | 10 | 70000 | 365000 |
| Lena | 20 | 65000 | 430000 |
| Nils | 20 | 80000 | 510000 |
| Paula | 20 | 65000 | 575000 |
| Rose | 30 | 300000 | 875000 |
| William | 30 | 70000 | 945000 |
+----------+---------------+--------+--------+
Cumulative sum
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Cumulative sum
SELECT name, department_id, salary,
SUM(salary) OVER (ORDER BY department_id, name
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) sum
FROM employee
ORDER BY department_id, name;
No partition specified: the
window frame grows
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Nils | NULL | 75000 | 75000 |
| Dag | 10 | NULL | 75000 |
| Erik | 10 | 100000 | 175000 |
| Frederik | 10 | 60000 | 235000 |
| Jon | 10 | 60000 | 295000 |
| Michael | 10 | 70000 | 365000 |
| Lena | 20 | 65000 | 430000 |
| Nils | 20 | 80000 | 510000 |
| Paula | 20 | 65000 | 575000 |
| Rose | 30 | 300000 | 875000 |
| William | 30 | 70000 | 945000 |
+----------+---------------+--------+--------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Cumulative sum
SELECT name, department_id, salary,
SUM(salary) OVER (ORDER BY department_id, name
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) sum
FROM employee
ORDER BY department_id, name;
No partition specified: the
window frame grows
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Nils | NULL | 75000 | 75000 |
| Dag | 10 | NULL | 75000 |
| Erik | 10 | 100000 | 175000 |
| Frederik | 10 | 60000 | 235000 |
| Jon | 10 | 60000 | 295000 |
| Michael | 10 | 70000 | 365000 |
| Lena | 20 | 65000 | 430000 |
| Nils | 20 | 80000 | 510000 |
| Paula | 20 | 65000 | 575000 |
| Rose | 30 | 300000 | 875000 |
| William | 30 | 70000 | 945000 |
+----------+---------------+--------+--------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Cumulative sum
SELECT name, department_id, salary,
SUM(salary) OVER (ORDER BY department_id, name
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) sum
FROM employee
ORDER BY department_id, name;
No partition specified: the
window frame grows
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Nils | NULL | 75000 | 75000 |
| Dag | 10 | NULL | 75000 |
| Erik | 10 | 100000 | 175000 |
| Frederik | 10 | 60000 | 235000 |
| Jon | 10 | 60000 | 295000 |
| Michael | 10 | 70000 | 365000 |
| Lena | 20 | 65000 | 430000 |
| Nils | 20 | 80000 | 510000 |
| Paula | 20 | 65000 | 575000 |
| Rose | 30 | 300000 | 875000 |
| William | 30 | 70000 | 945000 |
+----------+---------------+--------+--------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Cumulative sum
SELECT name, department_id, salary,
SUM(salary) OVER (ORDER BY department_id, name
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) sum
FROM employee
ORDER BY department_id, name;
No partition specified: the
window frame grows
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Nils | NULL | 75000 | 75000 |
| Dag | 10 | NULL | 75000 |
| Erik | 10 | 100000 | 175000 |
| Frederik | 10 | 60000 | 235000 |
| Jon | 10 | 60000 | 295000 |
| Michael | 10 | 70000 | 365000 |
| Lena | 20 | 65000 | 430000 |
| Nils | 20 | 80000 | 510000 |
| Paula | 20 | 65000 | 575000 |
| Rose | 30 | 300000 | 875000 |
| William | 30 | 70000 | 945000 |
+----------+---------------+--------+--------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Cumulative sum,
partitioned
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Nils | NULL | 75000 | 75000 |
| Dag | 10 | NULL | NULL |
| Erik | 10 | 100000 | 100000 |
| Frederik | 10 | 60000 | 160000 |
| Jon | 10 | 60000 | 220000 |
| Michael | 10 | 70000 | 290000 |
| Lena | 20 | 65000 | 65000 |
| Nils | 20 | 80000 | 145000 |
| Paula | 20 | 65000 | 210000 |
| Rose | 30 | 300000 | 300000 |
| William | 30 | 70000 | 370000 |
+----------+---------------+--------+--------+
SELECT name, department_id, salary,
SUM(salary) OVER (PARTITION BY department_id
ORDER BY name
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) sum
FROM employee
ORDER BY department_id, name;
No partition specified: the
window frame grows
New partition: reset
sum
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Cumulative sum,
partitioned
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Nils | NULL | 75000 | 75000 |
| Dag | 10 | NULL | NULL |
| Erik | 10 | 100000 | 100000 |
| Frederik | 10 | 60000 | 160000 |
| Jon | 10 | 60000 | 220000 |
| Michael | 10 | 70000 | 290000 |
| Lena | 20 | 65000 | 65000 |
| Nils | 20 | 80000 | 145000 |
| Paula | 20 | 65000 | 210000 |
| Rose | 30 | 300000 | 300000 |
| William | 30 | 70000 | 370000 |
+----------+---------------+--------+--------+
SELECT name, department_id, salary,
SUM(salary) OVER (PARTITION BY department_id
ORDER BY name
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) sum
FROM employee
ORDER BY department_id, name;
No partition specified: the
window frame grows
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Cumulative sum,
partitioned
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Nils | NULL | 75000 | 75000 |
| Dag | 10 | NULL | NULL |
| Erik | 10 | 100000 | 100000 |
| Frederik | 10 | 60000 | 160000 |
| Jon | 10 | 60000 | 220000 |
| Michael | 10 | 70000 | 290000 |
| Lena | 20 | 65000 | 65000 |
| Nils | 20 | 80000 | 145000 |
| Paula | 20 | 65000 | 210000 |
| Rose | 30 | 300000 | 300000 |
| William | 30 | 70000 | 370000 |
+----------+---------------+--------+--------+
SELECT name, department_id, salary,
SUM(salary) OVER (PARTITION BY department_id
ORDER BY name
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) sum
FROM employee
ORDER BY department_id, name;
No partition specified: the
window frame grows
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Parts of window function
SELECT name, department_id, salary,
SUM(salary) OVER (PARTITION BY department_id
ORDER BY name
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) sum
FROM employee
ORDER BY department_id, name;
Function call + OVER keyword signals a window function
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Parts of window function
SELECT name, department_id, salary,
SUM(salary) OVER (PARTITION BY department_id
ORDER BY name
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) sum
FROM employee
ORDER BY department_id, name;
An optional partition specification:
PARTITION BY <expression> {, <expression}*
● A partition divides up the result set in disjoint sets
● A window function does not see rows in partitions other than that of
the current row for which it is being evaluated
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Parts of window function
SELECT name, department_id, salary,
SUM(salary) OVER (PARTITION BY department_id
ORDER BY name
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) sum
FROM employee
ORDER BY department_id, name;
An optional ordering specification:
ORDER BY <expression> {, <expression}*
● Orders the row within the partition
● Not the same as a final query ORDER BY
and makes no guarantees of final query row
ordering.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
ORDER BY: growing frame
● Some window functions need row ordering to be useful, e.g. RANK
● Peers: same value for ORDER BY expression(s)
SELECT name, department_id, salary,
SUM(salary) OVER (ORDER BY salary DESC) AS sum
FROM employee;
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Rose | 30 | 300000 | 300000 |
| Erik | 10 | 100000 | 400000 |
| Nils | 20 | 80000 | 480000 |
| Nils | NULL | 75000 | 555000 |
| Michael | 10 | 70000 | 625000 |
| William | 30 | 70000 | 695000 |
| Lena | 20 | 65000 | 760000 |
| Paula | 20 | 65000 | 825000 |
| Frederik | 10 | 60000 | 885000 |
| Jon | 10 | 60000 | 945000 |
| Dag | 10 | NULL | 945000 |
+----------+---------------+--------+--------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
ORDER BY: growing frame
● Some window functions need row ordering to be useful, e.g. RANK
● Peers: same value for ORDER BY expression(s)
SELECT name, department_id, salary,
SUM(salary) OVER (ORDER BY salary DESC) AS sum
FROM employee;
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Rose | 30 | 300000 | 300000 |
| Erik | 10 | 100000 | 400000 |
| Nils | 20 | 80000 | 480000 |
| Nils | NULL | 75000 | 555000 |
| Michael | 10 | 70000 | 695000 |
| William | 30 | 70000 | 695000 |
| Lena | 20 | 65000 | 825000 |
| Paula | 20 | 65000 | 825000 |
| Frederik | 10 | 60000 | 945000 |
| Jon | 10 | 60000 | 945000 |
| Dag | 10 | NULL | 945000 |
+----------+---------------+--------+--------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
ORDER BY: growing frame
● Some window functions need row ordering to be useful, e.g. RANK
● Peers: same value for ORDER BY expression(s)
SELECT name, department_id, salary,
SUM(salary) OVER (ORDER BY salary DESC) AS sum
FROM employee;
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Rose | 30 | 300000 | 300000 |
| Erik | 10 | 100000 | 400000 |
| Nils | 20 | 80000 | 480000 |
| Nils | NULL | 75000 | 555000 |
| Michael | 10 | 70000 | 695000 |
| William | 30 | 70000 | 695000 |
| Lena | 20 | 65000 | 825000 |
| Paula | 20 | 65000 | 825000 |
| Frederik | 10 | 60000 | 945000 |
| Jon | 10 | 60000 | 945000 |
| Dag | 10 | NULL | 945000 |
+----------+---------------+--------+--------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
ORDER BY: growing frame
● Some window functions need row ordering to be useful, e.g. RANK
● Peers: same value for ORDER BY expression(s)
SELECT name, department_id, salary,
SUM(salary) OVER (ORDER BY salary DESC) AS sum
FROM employee;
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Rose | 30 | 300000 | 300000 |
| Erik | 10 | 100000 | 400000 |
| Nils | 20 | 80000 | 480000 |
| Nils | NULL | 75000 | 555000 |
| Michael | 10 | 70000 | 695000 |
| William | 30 | 70000 | 695000 |
| Lena | 20 | 65000 | 825000 |
| Paula | 20 | 65000 | 825000 |
| Frederik | 10 | 60000 | 945000 |
| Jon | 10 | 60000 | 945000 |
| Dag | 10 | NULL | 945000 |
+----------+---------------+--------+--------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
ORDER BY: growing frame
PEERS
● Some window functions need row ordering to be useful, e.g. RANK
● Peers: same value for ORDER BY expression(s)
SELECT name, department_id, salary,
SUM(salary) OVER (ORDER BY salary DESC) AS sum
FROM employee;
+----------+---------------+--------+--------+
| name | department_id | salary | sum |
+----------+---------------+--------+--------+
| Rose | 30 | 300000 | 300000 |
| Erik | 10 | 100000 | 400000 |
| Nils | 20 | 80000 | 480000 |
| Nils | NULL | 75000 | 555000 |
| Michael | 10 | 70000 | 695000 |
| William | 30 | 70000 | 695000 |
| Lena | 20 | 65000 | 825000 |
| Paula | 20 | 65000 | 825000 |
| Frederik | 10 | 60000 | 945000 |
| Jon | 10 | 60000 | 945000 |
| Dag | 10 | NULL | 945000 |
+----------+---------------+--------+--------+
What happened here?
Answer: Two rows are
peers w.r.t. salary
This is an example of a
RANGE frame (implicit)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Parts of window function
SELECT name, department_id, salary,
SUM(salary) OVER (PARTITION BY department_id
ORDER BY name
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) sum
FROM employee
ORDER BY department_id, name;
An optional frame specification
● A subset of rows within a partition
● Extent can depend on the current row
● Default frame: partition
● Not all window functions heed frame
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
partition
CURRENT
ROW
UNBOUNDED
PRECEDING
UNBOUNDED
FOLLOWING
n
PRECEDING
m
FOLLOWING
n: numeric or temporal
Frame anatomy
Examples:
● ROWS BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW
● RANGE CURRENT ROW
● ROWS BETWEEN CURRENT ROW AND
3 FOLLOWING
● ROWS BETWEEN 2 PRECEDING AND
2 FOLLOWING
● RANGE INTERVAL 6 DAY
PRECEDING
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
partition
CURRENT
ROW
UNBOUNDED
PRECEDING
UNBOUNDED
FOLLOWING
n
PRECEDING
m
FOLLOWING
n: numeric or temporal
Frame anatomy
Examples:
● ROWS BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW
● RANGE CURRENT ROW
● ROWS BETWEEN CURRENT ROW AND
3 FOLLOWING
● ROWS BETWEEN 2 PRECEDING AND
2 FOLLOWING
● RANGE INTERVAL 6 DAY
PRECEDING
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
partition
CURRENT
ROW
UNBOUNDED
PRECEDING
n: numeric or temporal
Frame anatomy
Examples:
● ROWS BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW
● RANGE CURRENT ROW
● ROWS BETWEEN CURRENT ROW AND
3 FOLLOWING
● ROWS BETWEEN 2 PRECEDING AND
2 FOLLOWING
● RANGE INTERVAL 6 DAY
PRECEDING
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
partition
CURRENT
ROW and
peers
Frame anatomy
Examples:
● ROWS BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW
● RANGE CURRENT ROW
● ROWS BETWEEN CURRENT ROW AND
3 FOLLOWING
● ROWS BETWEEN 2 PRECEDING AND
2 FOLLOWING
● RANGE INTERVAL 6 DAY
PRECEDING
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
partition
CURRENT
ROW
3
FOLLOWING
Frame anatomy
Examples:
● ROWS BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW
● RANGE CURRENT ROW
● ROWS BETWEEN CURRENT ROW AND
3 FOLLOWING
● ROWS BETWEEN 2 PRECEDING AND
2 FOLLOWING
● RANGE INTERVAL 6 DAY
PRECEDING
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
partition
CURRENT
ROW
2
FOLLOWING
Frame anatomy
Examples:
● ROWS BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW
● RANGE CURRENT ROW
● ROWS BETWEEN CURRENT ROW AND
3 FOLLOWING
● ROWS BETWEEN 2 PRECEDING AND
2 FOLLOWING
● RANGE INTERVAL 6 DAY
PRECEDING
2
PRECEDING
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
ROWS vs. RANGE
● Frame boundaries: physical (ROWS) or logical (RANGE)
● ROWS: bound N: # rows. Peers are ignored.
● RANGE requires ORDER BY on a single numeric or temporal expression
● RANGE: bound N: rows with value for ascending ORDER BY expression within
N lower (PRECEDING) and M higher (FOLLOWING) of value of the current row.
Peers are always included in frame.
Ex: ORDER BY date
RANGE BETWEEN INTERVAL 6 DAY PRECEDING AND CURRENT ROW
specifies all rows within last week.
● Default (std):
OVER (ORDER BY n) ==
OVER (ORDER BY n RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Determinacy
SELECT name, department_id, salary,
SUM(salary) OVER (PARTITION BY department_id
ORDER BY name
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) sum
FROM employee;
In general, window queries are not
deterministic unless one orders on enough
expressions to designate the row uniquely.
Minimum guarantee by SQL std: several
equivalent non-deterministic orderings in same
query give the same order (within query).
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Determinacy
SELECT name, department_id, salary,
SUM(salary) OVER (PARTITION BY department_id
ORDER BY name
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) sum
FROM employee
ORDER BY department_id, name;
In general, window queries are not
deterministic unless one orders on enough
expressions to designate the row uniquely.
Minimum guarantee by SQL std: several
equivalent non-deterministic orderings in same
query give the same order (within query).
A final ORDER BY is still required if ordering is
desired: no guarantees from window.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Example: salary analysis
Question: find the employees with the largest difference between their wage and
that of the department average
SELECT name, department_id, salary,
AVG(salary) OVER (PARTITION BY department_id) AS avg,
salary - AVG(salary) OVER (PARTITION BY department_id)
AS diff
FROM employee ORDER BY diff desc;
+----------+---------------+--------+-------------+--------------+
| name | department_id | salary | avg | diff |
+----------+---------------+--------+-------------+--------------+
| Rose | 30 | 300000 | 185000.0000 | 115000.0000 |
| Erik | 10 | 100000 | 72500.0000 | 27500.0000 |
| Nils | 20 | 80000 | 70000.0000 | 10000.0000 |
| Nils | NULL | 75000 | 75000.0000 | 0.0000 |
| Michael | 10 | 70000 | 72500.0000 | -2500.0000 |
| Lena | 20 | 65000 | 70000.0000 | -5000.0000 |
| Paula | 20 | 65000 | 70000.0000 | -5000.0000 |
| Frederik | 10 | 60000 | 72500.0000 | -12500.0000 |
| Jon | 10 | 60000 | 72500.0000 | -12500.0000 |
| William | 30 | 70000 | 185000.0000 | -115000.0000 |
| Dag | 10 | NULL | 72500.0000 | NULL |
+----------+---------------+--------+-------------+--------------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Example: salary analysis
Question: find the employees with the largest difference between their wage and
that of the department average
SELECT name, department_id, salary,
AVG(salary) OVER (PARTITION BY department_id) AS avg,
salary - AVG(salary) OVER (PARTITION BY department_id)
AS diff
FROM employee ORDER BY diff desc;
● Here: two distinct windows
● A query can use have any number of windows
● Logically evaluated in multiple phases
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Example: named window
Question: find the employees with the largest difference between their wage and
that of the department average
SELECT name, department_id, salary,
AVG(salary) OVER w AS avg,
salary - AVG(salary) OVER w AS diff FROM employee
WINDOW w as (PARTITION BY department_id)
ORDER BY diff desc;
Named window w
References to w
● Multiple window functions per window
● Will be evaluated in same phase (efficiency)
● Better readability
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Ex: moving AVG, smoothing
CREATE TABLE sales(id INT AUTO_INCREMENT PRIMARY KEY,
date DATE, sale INT); ...;
SELECT * FROM sales;
+----+------------+------+
| id | date | sale |
+----+------------+------+
| 1 | 2017-03-01 | 200 |
| 2 | 2017-04-01 | 300 |
| 3 | 2017-05-01 | 400 |
| 4 | 2017-06-01 | 200 |
| 5 | 2017-07-01 | 600 |
| 6 | 2017-08-01 | 100 |
| 7 | 2017-03-01 | 400 |
| 8 | 2017-04-01 | 300 |
| 9 | 2017-05-01 | 500 |
| 10 | 2017-06-01 | 400 |
| 11 | 2017-07-01 | 600 |
| 12 | 2017-08-01 | 150 |
+----+------------+------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Ex: moving AVG, smoothing
+-------------+-----------+
| MONTH(date) | SUM(sale) |
+-------------+-----------+
| 3 | 600 |
| 4 | 600 |
| 5 | 900 |
| 6 | 600 |
| 7 | 1200 |
| 8 | 250 |
+-------------+-----------+
● Sum up sales per month
SELECT MONTH(date), SUM(sale)
FROM sales
GROUP BY MONTH(date);
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Ex: moving AVG, smoothing
● Moving AVG over 3 months
SELECT MONTH(date), SUM(sale),
AVG(SUM(sale)) OVER w AS sliding_avg
FROM sales
GROUP BY MONTH(date)
WINDOW w AS (ORDER BY MONTH(date)
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING);
+-------------+-----------+-------------+
| MONTH(date) | SUM(sale) | sliding_avg |
+-------------+-----------+-------------+
| 3 | 600 | 600.0000 |
| 4 | 600 | |
| 5 | 900 | |
| 6 | 600 | |
| 7 | 1200 | |
| 8 | 250 | |
+-------------+-----------+-------------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Ex: moving AVG, smoothing
+-------------+-----------+-------------+
| MONTH(date) | SUM(sale) | sliding_avg |
+-------------+-----------+-------------+
| 3 | 600 | 600.0000 |
| 4 | 600 | 700.0000 |
| 5 | 900 | |
| 6 | 600 | |
| 7 | 1200 | |
| 8 | 250 | |
+-------------+-----------+-------------+
● Moving AVG over 3 months
SELECT MONTH(date), SUM(sale),
AVG(SUM(sale)) OVER w AS sliding_avg
FROM sales
GROUP BY MONTH(date)
WINDOW w AS (ORDER BY MONTH(date)
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING);
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Ex: moving AVG, smoothing
● Moving AVG over 3 months
SELECT MONTH(date), SUM(sale),
AVG(SUM(sale)) OVER w AS sliding_avg
FROM sales
GROUP BY MONTH(date)
WINDOW w AS (ORDER BY MONTH(date)
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING);
+-------------+-----------+-------------+
| MONTH(date) | SUM(sale) | sliding_avg |
+-------------+-----------+-------------+
| 3 | 600 | 600.0000 |
| 4 | 600 | 700.0000 |
| 5 | 900 | 700.0000 |
| 6 | 600 | |
| 7 | 1200 | |
| 8 | 250 | |
+-------------+-----------+-------------+
moving
frame
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Ex: moving AVG, smoothing
● Moving AVG over 3 months
SELECT MONTH(date), SUM(sale),
AVG(SUM(sale)) OVER w AS sliding_avg
FROM sales
GROUP BY MONTH(date)
WINDOW w AS (ORDER BY MONTH(date)
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING);
+-------------+-----------+-------------+
| MONTH(date) | SUM(sale) | sliding_avg |
+-------------+-----------+-------------+
| 3 | 600 | 600.0000 |
| 4 | 600 | 700.0000 |
| 5 | 900 | 700.0000 |
| 6 | 600 | 900.0000 |
| 7 | 1200 | |
| 8 | 250 | |
+-------------+-----------+-------------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Ex: moving AVG, smoothing
● Moving AVG over 3 months
SELECT MONTH(date), SUM(sale),
AVG(SUM(sale)) OVER w AS sliding_avg
FROM sales
GROUP BY MONTH(date)
WINDOW w AS (ORDER BY MONTH(date)
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING);
+-------------+-----------+-------------+
| MONTH(date) | SUM(sale) | sliding_avg |
+-------------+-----------+-------------+
| 3 | 600 | 600.0000 |
| 4 | 600 | 700.0000 |
| 5 | 900 | 700.0000 |
| 6 | 600 | 900.0000 |
| 7 | 1200 | 683.3333 |
| 8 | 250 | |
+-------------+-----------+-------------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Ex: moving AVG, smoothing
● Moving AVG over 3 months
SELECT MONTH(date), SUM(sale),
AVG(SUM(sale)) OVER w AS sliding_avg
FROM sales
GROUP BY MONTH(date)
WINDOW w AS (ORDER BY MONTH(date)
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING);
+-------------+-----------+-------------+
| MONTH(date) | SUM(sale) | sliding_avg |
+-------------+-----------+-------------+
| 3 | 600 | 600.0000 |
| 4 | 600 | 700.0000 |
| 5 | 900 | 700.0000 |
| 6 | 600 | 900.0000 |
| 7 | 1200 | 683.3333 |
| 8 | 250 | 725.0000 |
+-------------+-----------+-------------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Ex: moving AVG, smoothing
● Moving AVG over 3 months
SELECT MONTH(date), SUM(sale),
AVG(SUM(sale)) OVER w AS sliding_avg
FROM sales
GROUP BY MONTH(date)
WINDOW w AS (ORDER BY MONTH(date)
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING);
+-------------+-----------+-------------+
| MONTH(date) | SUM(sale) | sliding_avg |
+-------------+-----------+-------------+
| 3 | 600 | 600.0000 |
| 4 | 600 | 700.0000 |
| 5 | 900 | 700.0000 |
| 6 | 600 | 900.0000 |
| 7 | 1200 | 683.3333 |
| 8 | 250 | 725.0000 |
+-------------+-----------+-------------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Windowing in an SQL query
JOIN
GROUP
BY,
HAVING
WINDOW
1
WINDOW
n
ORDER BY/
DISTINCT/
LIMIT
● Window functions see query result set after grouping/having
- filtering on wf results requires subquery
● Ordering not semantically significant
● Window functions can't use window functions in same query (without
using subqueries)
● In practice, ordering matters. The optimizer can is allowed to
- reorder to minimize sorting required
- merge window phases if equivalent
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
•
Introduction: what & why
•
What's supported
•
Ranking and analytical wfs
•
Implementation & performance
2
4
3
4
1
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
PART II
in which we learn which window functions are supported
by MySQL
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
MySQL 8.0
● Most aggregate functions in MySQL can be used as window functions:
COUNT, SUM, AVG, MAX, MIN, STDDEV_POP (& synonyms),
STDDEV_SAMP, VAR_POP (& synonym), VAR_SAMP
Limitation: No DISTINCT in aggregates yet
● All SQL standard specialized window functions
ROW_NUMBER, RANK, DENSE_RANK, PERCENT_RANK, CUME_DIST,
NTILE, LEAD, LAG, FIRST_VALUE, LAST_VALUE, NTH_VALUE
● Next phase (probably post-GA), more aggregates:
BIT_OR, BIT_XOR, BIT_AND, JSON_ARRAYAGG, JSON_OBJECTAGG
[ GROUP_CONCAT ]
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Std compliance: extensions
Target SQL standard semantics, but
● Expression (not only column) allowed in PARTITION BY
Benefit: more flexible
● Missing ORDER BY tolerated even if useless (all rows are peers)
except for RANGE <value>:
requires single ORDER BY expression
● Tolerate frame clause even for window functions that operate on entire
partition (std is stricter)
Benefit: Many wfs can use same window definition
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Std compliance: restrictions
Target SQL standard semantics, but
● Valued frame bounds must be static in query
● No GROUPS in frame clause (Feature T620)
● No EXCLUDE in frame clause
● No DISTINCT in aggregates with windowing
● IGNORE NULLS not supported
● FROM LAST not supported (NTH_VALUE)
● No nested window functions (Feature T619)
● No row pattern recognition in window clause (Feature R020)
● Operator subqueries with windowing only if materializable (WL#10431)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
•
Introduction: what & why
•
What's supported
•
Ranking and analytical wfs
•
Implementation & performance
3
4
2
4
1
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
PART III
in which we learn about the specialized window functions
Ranking: ROW_NUMBER, RANK, DENSE_RANK, PERCENT_RANK,
CUME_DIST, NTILE
Analytical: LEAD, LAG, NTH_VALUE, FIRST_VALUE,
LAST_VALUE
Blue ones use frames, the others work on the entire partition.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
ROW_NUMBER
● Assign number to row in ascending order
Example: give employees a number according to their salary
SELECT name, department_id AS dept, salary,
ROW_NUMBER() OVER w AS `row#`
FROM employee
WINDOW w AS (PARTITION BY department_id
ORDER BY salary DESC, name ASC)
ORDER BY department_id, `row#`;
+----------+------+--------+------+
| name | dept | salary | row# |
+----------+------+--------+------+
| Nils | NULL | 75000 | 1 |
| Erik | 10 | 100000 | 1 |
| Michael | 10 | 70000 | 2 |
| Frederik | 10 | 60000 | 3 |
| Jon | 10 | 60000 | 4 |
| Dag | 10 | NULL | 5 |
| Nils | 20 | 80000 | 1 |
| Lena | 20 | 65000 | 2 |
| Paula | 20 | 65000 | 3 |
| Rose | 30 | 300000 | 1 |
| William | 30 | 70000 | 2 |
+----------+------+--------+------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
RANK
● Rows that are the same w.r.t any ordering have the same rank
Example: rank employees within each department according to their
salary
SELECT name, department_id AS dept, salary, .. ,
RANK() OVER w AS `rank`
FROM employee
WINDOW w AS (PARTITION BY department_id
ORDER BY salary DESC)
ORDER BY department_id, `row#`;
+----------+------+--------+------+------+
| name | dept | salary | row# | rank |
+----------+------+--------+------+------+
| Nils | NULL | 75000 | 1 | 1 |
| Erik | 10 | 100000 | 1 | 1 |
| Michael | 10 | 70000 | 2 | 2 |
| Frederik | 10 | 60000 | 3 | 3 |
| Jon | 10 | 60000 | 4 | 3 |
| Dag | 10 | NULL | 5 | 5 |
| Nils | 20 | 80000 | 1 | 1 |
| Lena | 20 | 65000 | 2 | 2 |
| Paula | 20 | 65000 | 3 | 2 |
| Rose | 30 | 300000 | 1 | 1 |
| William | 30 | 70000 | 2 | 2 |
+----------+------+--------+------+------+
Peer rows w.r.t ordering,
skip rank 4.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
DENSE_RANK
● Rows that are the same wrt any ordering have the same rank
Example: rank employees within each department according to their
salary
SELECT name, department_id AS dept, salary, .. ,
DENSE_RANK() OVER w AS dense
FROM employee
WINDOW w AS (PARTITION BY department_id
ORDER BY salary DESC)
ORDER BY department_id, `row#`;
+----------+------+--------+------+------+-------+
| name | dept | salary | row# | rank | dense |
+----------+------+--------+------+------+-------+
| Nils | NULL | 75000 | 1 | 1 | 1 |
| Erik | 10 | 100000 | 1 | 1 | 1 |
| Michael | 10 | 70000 | 2 | 2 | 2 |
| Frederik | 10 | 60000 | 3 | 3 | 3 |
| Jon | 10 | 60000 | 4 | 3 | 3 |
| Dag | 10 | NULL | 5 | 5 | 4 |
| Nils | 20 | 80000 | 1 | 1 | 1 |
| Lena | 20 | 65000 | 2 | 2 | 2 |
| Paula | 20 | 65000 | 3 | 2 | 2 |
| Rose | 30 | 300000 | 1 | 1 | 1 |
| William | 30 | 70000 | 2 | 2 | 2 |
+----------+------+--------+------+------+-------+
Peer rows w.r.t ordering,
do not skip rank 4.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
PERCENT_RANK
● Relative rank: (rank - 1) / (total p.rows - 1), or 0 if one row in partition
Example: rank employees within each department according to their
salary
SELECT name, department_id AS dept, salary, .. ,
PERCENT_RANK() OVER w AS `%rank`
FROM employee
WINDOW w AS (PARTITION BY department_id
ORDER BY salary DESC)
ORDER BY department_id, `row#`;
+----------+------+--------+------+------+-------+-------+
| name | dept | salary | row# | rank | dense | %rank |
+----------+------+--------+------+------+-------+-------+
| Nils | NULL | 75000 | 1 | 1 | 1 | 0 |
| Erik | 10 | 100000 | 1 | 1 | 1 | 0 |
| Michael | 10 | 70000 | 2 | 2 | 2 | 0.25 |
| Frederik | 10 | 60000 | 3 | 3 | 3 | 0.5 |
| Jon | 10 | 60000 | 4 | 3 | 3 | 0.5 |
| Dag | 10 | NULL | 5 | 5 | 4 | 1 |
| Nils | 20 | 80000 | 1 | 1 | 1 | 0 |
| Lena | 20 | 65000 | 2 | 2 | 2 | 0.5 |
| Paula | 20 | 65000 | 3 | 2 | 2 | 0.5 |
| Rose | 30 | 300000 | 1 | 1 | 1 | 0 |
| William | 30 | 70000 | 2 | 2 | 2 | 1 |
+----------+------+--------+------+------+-------+-------+
(3-1)/(5-1)=0.5
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
CUME_DIST
● Cumulative relative rank: preceding rows incl. peers / total p.rows
● Example: cumulative rank of employees within each department according
to their salary
SELECT name, department_id AS dept, salary, .. ,
CUME_DIST() OVER w AS cume
FROM employee
WINDOW w AS (PARTITION BY department_id
ORDER BY salary DESC)
ORDER BY department_id, `row#`;
+----------+------+--------+------+------+-------+-------+---------+
| name | dept | salary | row# | rank | dense | %rank | cume |
+----------+------+--------+------+------+-------+-------+---------+
| Nils | NULL | 75000 | 1 | 1 | 1 | 0 | 1 |
| Erik | 10 | 100000 | 1 | 1 | 1 | 0 | 0.2 |
| Michael | 10 | 70000 | 2 | 2 | 2 | 0.25 | 0.4 |
| Frederik | 10 | 60000 | 3 | 3 | 3 | 0.5 | 0.8 |
| Jon | 10 | 60000 | 4 | 3 | 3 | 0.5 | 0.8 |
| Dag | 10 | NULL | 5 | 5 | 4 | 1 | 1 |
| Nils | 20 | 80000 | 1 | 1 | 1 | 0 | 0.33333 |
| Lena | 20 | 65000 | 2 | 2 | 2 | 0.5 | 1 |
| Paula | 20 | 65000 | 3 | 2 | 2 | 0.5 | 1 |
| Rose | 30 | 300000 | 1 | 1 | 1 | 0 | 0.5 |
| William | 30 | 70000 | 2 | 2 | 2 | 1 | 1 |
+----------+------+--------+------+------+-------+-------+---------+
4/5=0.8
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
NTILE
● Divides an ordered partition into a specified number of groups aka buckets
as evenly as possible and assigns a bucket number to each row in the
partition. In spite of name, not the same as percentile!
SELECT name, department_id AS dept, salary, .. ,
NTILE(3) OVER w AS `3-tile`
FROM employee
WINDOW w AS (PARTITION BY department_id
ORDER BY salary DESC)
ORDER BY department_id, `row#`;
+----------+------+--------+------+------+-------+-------+---------+--------+
| name | dept | salary | row# | rank | dense | %rank | cume | 3-tile |
+----------+------+--------+------+------+-------+-------+---------+--------+
| Nils | NULL | 75000 | 1 | 1 | 1 | 0 | 1 | 1 |
| Erik | 10 | 100000 | 1 | 1 | 1 | 0 | 0.2 | 1 |
| Michael | 10 | 70000 | 2 | 2 | 2 | 0.25 | 0.4 | 1 |
| Frederik | 10 | 60000 | 3 | 3 | 3 | 0.5 | 0.8 | 2 |
| Jon | 10 | 60000 | 4 | 3 | 3 | 0.5 | 0.8 | 2 |
| Dag | 10 | NULL | 5 | 5 | 4 | 1 | 1 | 3 |
| Nils | 20 | 80000 | 1 | 1 | 1 | 0 | 0.33333 | 1 |
| Lena | 20 | 65000 | 2 | 2 | 2 | 0.5 | 1 | 2 |
| Paula | 20 | 65000 | 3 | 2 | 2 | 0.5 | 1 | 3 |
| Rose | 30 | 300000 | 1 | 1 | 1 | 0 | 0.5 | 1 |
| William | 30 | 70000 | 2 | 2 | 2 | 1 | 1 | 2 |
+----------+------+--------+------+------+-------+-------+---------+--------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
LEAD, LAG
● Returns value evaluated at the row that is offset rows after/before the current row
within the partition; if there is no such row, instead return an optional default
expression (which must be of the same type as value).
● Both offset and default expr are evaluated with respect to the current row. If
omitted, offset defaults to 1 and default expr to null
Syntax: LEAD( <expr> [, <offset> [, <default expr> ] ] ) [ <RESPECT NULLS> ]
Example: LEAD(date) OVER (..)
Note: “IGNORE NULLS” not supported, RESPECT NULLS is default but can be
specified.
● Any window frame is ignored
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
LEAD
+----------+------+--------+--------+
| name | dept | salary | lead |
+----------+------+--------+--------+
| Rose | 30 | 300000 | 100000 |
| Erik | 10 | 100000 | |
| Nils | 20 | 80000 | |
| Nils | NULL | 75000 | |
| Michael | 10 | 70000 | |
| William | 30 | 70000 | : |
| Lena | 20 | 65000 | |
| Paula | 20 | 65000 | |
| Frederik | 10 | 60000 | |
| Jon | 10 | 60000 | |
| Dag | 10 | NULL | |
+----------+------+--------+--------+
SELECT name, department_id AS dept, salary,
LEAD(salary, 1) OVER w AS `lead`
FROM employee
WINDOW w AS (ORDER BY salary DESC);
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
LEAD
+----------+------+--------+--------+
| name | dept | salary | lead |
+----------+------+--------+--------+
| Rose | 30 | 300000 | 100000 |
| Erik | 10 | 100000 | 80000 |
| Nils | 20 | 80000 | |
| Nils | NULL | 75000 | |
| Michael | 10 | 70000 | |
| William | 30 | 70000 | |
| Lena | 20 | 65000 | |
| Paula | 20 | 65000 | |
| Frederik | 10 | 60000 | |
| Jon | 10 | 60000 | |
| Dag | 10 | NULL | |
+----------+------+--------+--------+
SELECT name, department_id AS dept, salary,
LEAD(salary, 1) OVER w AS `lead`
FROM employee
WINDOW w AS (ORDER BY salary DESC);
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
LEAD
+----------+------+--------+--------+
| name | dept | salary | lead |
+----------+------+--------+--------+
| Rose | 30 | 300000 | 100000 |
| Erik | 10 | 100000 | 80000 |
| Nils | 20 | 80000 | 75000 |
| Nils | NULL | 75000 | |
| Michael | 10 | 70000 | |
| William | 30 | 70000 | |
| Lena | 20 | 65000 | |
| Paula | 20 | 65000 | |
| Frederik | 10 | 60000 | |
| Jon | 10 | 60000 | |
| Dag | 10 | NULL | |
+----------+------+--------+--------+
SELECT name, department_id AS dept, salary,
LEAD(salary, 1) OVER w AS `lead`
FROM employee
WINDOW w AS (ORDER BY salary DESC);
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
LEAD
+----------+------+--------+--------+
| name | dept | salary | lead |
+----------+------+--------+--------+
| Rose | 30 | 300000 | 100000 |
| Erik | 10 | 100000 | 80000 |
| Nils | 20 | 80000 | 75000 |
| Nils | NULL | 75000 | 70000 |
| Michael | 10 | 70000 | 70000 |
| William | 30 | 70000 | 65000 |
| Lena | 20 | 65000 | 65000 |
| Paula | 20 | 65000 | 60000 |
| Frederik | 10 | 60000 | 60000 |
| Jon | 10 | 60000 | NULL |
| Dag | 10 | NULL | NULL |
+----------+------+--------+--------+
SELECT name, department_id AS dept, salary,
LEAD(salary, 1) OVER w AS `lead`
FROM employee
WINDOW w AS (ORDER BY salary DESC);
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
LEAD
+----------+------+--------+--------+
| name | dept | salary | lead |
+----------+------+--------+--------+
| Rose | 30 | 300000 | 100000 |
| Erik | 10 | 100000 | 80000 |
| Nils | 20 | 80000 | 75000 |
| Nils | NULL | 75000 | 70000 |
| Michael | 10 | 70000 | 70000 |
| William | 30 | 70000 | 65000 |
| Lena | 20 | 65000 | 65000 |
| Paula | 20 | 65000 | 60000 |
| Frederik | 10 | 60000 | 60000 |
| Jon | 10 | 60000 | NULL |
| Dag | 10 | NULL | 77000 |
+----------+------+--------+--------+
SELECT name, department_id AS dept, salary,
LEAD(salary, 1, 77000) OVER w AS `lead`
FROM employee
WINDOW w AS (ORDER BY salary DESC);
default
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
LEAD - gap detection
● Classic example:
CREATE TABLE t(i INT);
INSERT INTO t VALUES (1), (2), (4), (5), (6), (8),
(9), (10);
SELECT i, l FROM
(SELECT i, LEAD(i) OVER (ORDER BY i) AS l FROM t) d
WHERE i + 1 <> l;
+------+------+
| i | l |
+------+------+
| 2 | 4 |
| 6 | 8 |
+------+------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
FIRST_VALUE, LAST_VALUE
Returns value evaluated at the first, last in the frame of the current row within the
partition; if there is no nth row (frame is too small), the NTH_VALUE returns NULL.
Note: “IGNORE NULLS” is not supported, RESPECT NULLS is default but can be
specified.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
FIRST_VALUE
● Difference between employee wages and best paid in department
SELECT name, department_id AS dept, salary,
FIRST_VALUE(salary) OVER w - salary AS diff
FROM employee
WINDOW w AS (PARTITION BY department_id
ORDER BY salary DESC)
+----------+------+--------+--------+
| name | dept | salary | diff |
+----------+------+--------+--------+
| Nils | NULL | 75000 | 0 |
| Erik | 10 | 100000 | 0 |
| Michael | 10 | 70000 | 30000 |
| Frederik | 10 | 60000 | 40000 |
| Jon | 10 | 60000 | 40000 |
| Dag | 10 | NULL | NULL |
| Nils | 20 | 80000 | 0 |
| Lena | 20 | 65000 | 15000 |
| Paula | 20 | 65000 | 15000 |
| Rose | 30 | 300000 | 0 |
| William | 30 | 70000 | 230000 |
+----------+------+--------+--------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
NTH_VALUE
Returns value evaluated at the nth in the frame of the current row within the
partition; if there is no nth row (frame is too small), the NTH_VALUE returns NULL.
Note: “IGNORE NULLS” is not supported, RESPECT NULLS is used but can be
specified.
Note: For NTH_VALUE, “FROM LAST” is not supported, FROM FIRST is used but
can be specified
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
•
Introduction: what & why
•
What's supported
•
Ranking and analytical wfs
•
Implementation & performance4
2
3
1
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
PART IV
is where we look at implementation and performance
aspects
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Windowing in an SQL query
JOIN
GROUP
BY,
HAVING
WINDOW
1
WINDOW
n
ORDER BY/
DISTINCT/
LIMIT
● Window functions see query result set after grouping/having
- filtering on wf results requires subquery
● Ordering not semantically significant
● Window functions can't use window functions in same query (without
using subqueries)
● In practice, ordering matters. The optimizer is allowed to
- reorder to minimize sorting required
- merge window phases if equivalent
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
JOIN
GROUP
BY
WINDOW
1
WINDOW
n
ORDER BY/
DISTINCT/
LIMIT
Sort by
concatenation of
PARTITION BY and
ORDER BY
● Tmp table between each windowing step
● Optimization: re-order windows to
eliminate sorting steps: when equal
PARTITION BY/ORDER BY expressions
Processing window functions
SELECT name, SUM(salary) OVER () FROM employee LIMIT 3
+----------+---------------------+
| name | SUM(salary) OVER () |
+----------+---------------------+
| Dag | 945000 |
| Erik | 945000 |
| Frederik | 945000 |
+----------+---------------------+
Need to read all rows to get SUM
before we can output row 1:
need buffering to merge original row
with result of window function
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
JOIN
GROUP
BY
WINDOW
1
WINDOW
n
ORDER BY/
DISTINCT/
LIMIT
Row
addressable
buffer aka
frame buffer
in-memory tmp table;
overflows automatically to disk if needed
Permits re-reading rows when frame
moves
Processing window functions
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Row
addressable
buffer aka
frame buffer
Streaming window functions
Frame buffer not needed for streaming window functions:
● ROW_NUMBER, RANK, DENSE_RANK
● Aggregates with ROW frame and dynamically growing
upper bound, e.g.
SELECT SUM(salary) OVER
(ROWS BETWEEN UNBOUNDED PRECEDING AND
CURRENT ROW)
FROM employee;
● Non-streaming (need buffer), e.g.
CUME_DIST, SUM() OVER ()
=> a function on what window function and which frame
specification
If in doubt, check with EXPLAIN FORMAT=JSON
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
EXPLAIN: streaming
EXPLAIN format=json SELECT SUM(salary) OVER
(ROWS UNBOUNDED PRECEDING) FROM employee;"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "1.35"
},
"windowing": {
"windows": [
{
"name": "<unnamed window>",
"functions": [
"sum"
]
}
],
"table": {
"table_name": "employee",
"access_type": "ALL",
"rows_examined_per_scan": 11,
"rows_produced_per_join": 11,
"filtered": "100.00",
"cost_info": {
"read_cost": "0.25",
"eval_cost": "1.10",
"prefix_cost": "1.35",
"data_read_per_join": "1K"
},
"used_columns": [
"salary"
]
}
}
}
}
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
EXPLAIN: frame buffer
EXPLAIN format=json SELECT COUNT(salary) OVER () FROM employee;
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "1.35"
},
"windowing": {
"windows": [
{
"name": "<unnamed window>",
"frame_buffer": {
"using_temporary_table": true,
"optimized_frame_evaluation": true
},
"functions": [
"count"
]
}
],
"table": {
"table_name": "employee",
"access_type": "ALL",
"rows_examined_per_scan": 11,
"rows_produced_per_join": 11,
"filtered": "100.00",
"cost_info": {
"read_cost": "0.25",
"eval_cost": "1.10",
"prefix_cost": "1.35",
"data_read_per_join": "1K"
},
"used_columns": [
"salary"
]
}
}
}
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
JOIN
GROUP
BY
WINDOW
1
WINDOW
n
ORDER BY/
DISTINCT/
LIMIT
Row
addressable
buffer aka
frame buffer
in-memory;
overflows automatically to disk if needed
Permits re-reading rows when frame
moves
Frame buffer processing
SELECT name, SUM(salary) OVER () FROM employee LIMIT 3
Optimization 1: Compute SUM only
once (static frame)
But what if frame changes?
+----------+---------------------+
| name | SUM(salary) OVER () |
+----------+---------------------+
| Dag | 945000 |
| Erik | 945000 |
| Frederik | 945000 |
+----------+---------------------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Frame buffer processing
● Case: expanding frame
SELECT name, salary, SUM(salary) OVER (ORDER BY name) AS `sum`
FROM employee LIMIT 3
Optimization 2: Remember SUM and
adjust: here add next row's
contribution: (NULL+100k)+60k=160k
But what if we have a moving frame?
+----------+--------+--------+
| name | salary | sum |
+----------+--------+--------+
| Dag | NULL | NULL |
| Erik | 100000 | 100000 |
| Frederik | 60000 | 160000 |
+----------+--------+--------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Inversion
● Case: moving frame: Sales over this month and last
SELECT MONTH(date) AS month, SUM(sale) AS monthly,
SUM(SUM(sale)) OVER (ORDER BY MONTH(date)
RANGE 1 PRECEDING) AS `this&last`
FROM sales
GROUP BY MONTH(date);
Optimization 3: Remember SUM and
adjust: here remove last contribution
from 2 PRECEDING, then add current
row's contribution: inversion
NOTE: Ok only if: a + b - a + c = b + c
This isn't always the case for
aggregates
Penalty if we can't do this for large
window frames: O(#partition size) vs.
O(#partition size * #frame size)
+-------+---------+-----------+
| month | monthly | this&last |
+-------+---------+-----------+
| 3 | 600 | 600 |
| 4 | 600 | 1200 |
| 5 | 900 | 1500 |
| 6 | 600 | 1500 |
| 7 | 1200 | 1800 |
| 8 | 250 | 1450 |
+-------+---------+-----------+
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Floating aggregates, standard deviation and variance aggregates
mysql> show variables like '%high%';
+------------------------------+-------+
| Variable_name | Value |
+------------------------------+-------+
| windowing_use_high_precision | ON |
+------------------------------+-------+
SET windowing_use_high_precision= off;
For variance, the differences are only in the last significant few digits to the
the incremental algorithm yielding slightly different results (usually insignificant)
For floats, this can matter of if summing very large and small numbers:
mysql> select 1.7976931348623157E+307 + 1 - 1.7976931348623157E+307;
+-------------------------------------------------------+
| 1.7976931348623157E+307 + 1 - 1.7976931348623157E+307 |
+-------------------------------------------------------+
| 0 |
+-------------------------------------------------------+
Conditional inversion
When: a + b - a + c ≠ b + c
● recomputes: slow, but
guaranteed same results as
grouped aggregates
● linear performance
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Performance hints
● Use named windows to eliminate several windowing steps whenever possible
Possibility: analyze and collapse windows where semantics allow it.
We might want to add this capability to the optimizer if large demand.
● Streaming window functions faster than those that need buffering
● MAX/MIN do not support inversion, so can slow with large
frames unless the expression is a prefix of the ORDER BY expressions
● JSON_OBJECTAGG is not invertible, so can slow with large
frames
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Q&A
Thank you for using MySQL!
Blog: http://mysqlserverteam.com/mysql-8-0-2-introducing-window-functions/
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The preceding is intended to outline our general product
direction. It is intended for information purposes only, and
may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing
decisions. The development, release, and timing of any
features or functionality described for Oracle’s products
remains at the sole discretion of Oracle.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Dublin 4x3-final-slideshare

  • 1.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Window functions in MySQL 8 Dag H. Wanvik Senior database engineer MySQL optimizer team Sep. 2017
  • 2.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
  • 3.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Program Agenda • Introduction: what & why • What's supported • Ranking and analytical wfs • Implementation & performance 1 2 3 44
  • 4.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | PART I Gentle intro in which we meet the SUM aggregate used as a window function and get introduced to window partitions and window frames
  • 5.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Why window functions? ● Part of SQL standard since 2003, with later additions ● Frequently requested feature(s) for data analysis ● Improves readability and often performance ● Most vendors support it, but to varying degrees (YMMV)
  • 6.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Why window functions? SELECT name o_name, department_id, salary AS o_salary, (SELECT SUM(salary) AS sum FROM employee WHERE salary <= o_salary AND NOT (salary = o_salary AND o_name > name)) AS sum FROM employee ORDER BY sum, name; SELECT name, department_id, salary, SUM(salary) OVER w AS sum FROM employee WINDOW w AS (ORDER BY salary, name ROWS UNBOUNDED PRECEDING) ORDER BY sum, name; +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Dag | 10 | NULL | NULL | | Frederik | 10 | 60000 | 60000 | | Jon | 10 | 60000 | 120000 | | Lena | 20 | 65000 | 185000 | | Paula | 20 | 65000 | 250000 | | Michael | 10 | 70000 | 320000 | | William | 30 | 70000 | 390000 | | Nils | NULL | 75000 | 465000 | | Nils | 20 | 80000 | 545000 | | Erik | 10 | 100000 | 645000 | | Rose | 30 | 300000 | 945000 | +----------+---------------+--------+--------+ ● Readability ● Performance my laptop: 50,000 rows: 16m vs 0.14s ● Or use self join, but tricky
  • 7.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | What's a SQL window function? Short answer: a function that gets its arguments from a set of rows; a window defined by a partition and a frame. OK, but ● what is partitioned data? ● what is a frame? ● what does a window function look like? Hint: OVER keyword
  • 8.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Running ex: employees +----------+---------------+--------+ | name | department_id | salary | +----------+---------------+--------+ | Nils | NULL | 75000 | | Dag | 10 | NULL | | Erik | 10 | 100000 | | Frederik | 10 | 60000 | | Jon | 10 | 60000 | | Michael | 10 | 70000 | | Lena | 20 | 65000 | | Nils | 20 | 80000 | | Paula | 20 | 65000 | | Rose | 30 | 300000 | | William | 30 | 70000 | +----------+---------------+--------+ SELECT name, department_id, salary FROM employee ORDER BY department_id, name;
  • 9.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Salaries per dept +----------+---------------+--------+ | name | department_id | salary | +----------+---------------+--------+ | Nils | NULL | 75000 | | Dag | 10 | NULL | | Erik | 10 | 100000 | | Frederik | 10 | 60000 | | Jon | 10 | 60000 | | Michael | 10 | 70000 | | Lena | 20 | 65000 | | Nils | 20 | 80000 | | Paula | 20 | 65000 | | Rose | 30 | 300000 | | William | 30 | 70000 | +----------+---------------+--------+ +---------------+-------------+ | department_id | SUM(salary) | +---------------+-------------+ | NULL | 75000 | | 10 | 290000 | | 20 | 210000 | | 30 | 370000 | +---------------+-------------+ SELECT department_id, SUM(salary) FROM employee GROUP BY department_id ORDER BY department_id; Query: find sums of salaries per department
  • 10.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Grouping loss SELECT department_id, SUM(salary) FROM employee GROUP BY department_id; Identity of names and salaries lost. Lena Nils Paula ∑
  • 11.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Windowing +----------+---------------+--------+ | name | department_id | salary | +----------+---------------+--------+ | Nils | NULL | 75000 | | Dag | 10 | NULL | | Erik | 10 | 100000 | | Frederik | 10 | 60000 | | Jon | 10 | 60000 | | Michael | 10 | 70000 | | Lena | 20 | 65000 | | Nils | 20 | 80000 | | Paula | 20 | 65000 | | Rose | 30 | 300000 | | William | 30 | 70000 | +----------+---------------+--------+ SELECT name, department_id, salary FROM employee ORDER BY department_id, name;
  • 12.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Windowing, rows kept +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Nils | NULL | 75000 | 945000 | | Dag | 10 | NULL | 945000 | | Erik | 10 | 100000 | 945000 | | Frederik | 10 | 60000 | 945000 | | Jon | 10 | 60000 | 945000 | | Michael | 10 | 70000 | 945000 | | Lena | 20 | 65000 | 945000 | | Nils | 20 | 80000 | 945000 | | Paula | 20 | 65000 | 945000 | | Rose | 30 | 300000 | 945000 | | William | 30 | 70000 | 945000 | +----------+---------------+--------+--------+ SELECT name, department_id, salary, SUM(salary) OVER () sum FROM employee ORDER BY department_id, name;
  • 13.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Windowing, «grouped» +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Nils | NULL | 75000 | 75000 | | Dag | 10 | NULL | 290000 | | Erik | 10 | 100000 | 290000 | | Frederik | 10 | 60000 | 290000 | | Jon | 10 | 60000 | 290000 | | Michael | 10 | 70000 | 290000 | | Lena | 20 | 65000 | 210000 | | Nils | 20 | 80000 | 210000 | | Paula | 20 | 65000 | 210000 | | Rose | 30 | 300000 | 370000 | | William | 30 | 70000 | 370000 | +----------+---------------+--------+--------+ SELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id) sum FROM employee ORDER BY department_id, name;
  • 14.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Partition == frame +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Nils | NULL | 75000 | 75000 | | Dag | 10 | NULL | 290000 | | Erik | 10 | 100000 | 290000 | | Frederik | 10 | 60000 | 290000 | | Jon | 10 | 60000 | 290000 | | Michael | 10 | 70000 | 290000 | | Lena | 20 | 65000 | 210000 | | Nils | 20 | 80000 | 210000 | | Paula | 20 | 65000 | 210000 | | Rose | 30 | 300000 | 370000 | | William | 30 | 70000 | 370000 | +----------+---------------+--------+--------+ SELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id) sum FROM employee ORDER BY department_id, name; All salaries in partition added: the window frame is the entire partition
  • 15.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | SELECT name, salary, department_id, SUM(salary) OVER (PARTITION BY department_id) sum FROM employee ORDER BY department_id; ∑ ∑ ∑Identity of department names and salaries kept: no rows are lost => A window function is similar to a scalar function: adds a result column => BUT: can read data from other rows than its own: within its WINDOW partition or frame Lena Nils Paula Windowing, rows kept
  • 16.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Default partition SELECT name, department_id, salary, SUM(salary) OVER () sum FROM employee ORDER BY department_id, name; No partition specified: the window frame is the entire result set +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Nils | NULL | 75000 | 945000 | | Dag | 10 | NULL | 945000 | | Erik | 10 | 100000 | 945000 | | Frederik | 10 | 60000 | 945000 | | Jon | 10 | 60000 | 945000 | | Michael | 10 | 70000 | 945000 | | Lena | 20 | 65000 | 945000 | | Nils | 20 | 80000 | 945000 | | Paula | 20 | 65000 | 945000 | | Rose | 30 | 300000 | 945000 | | William | 30 | 70000 | 945000 | +----------+---------------+--------+--------+
  • 17.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | SELECT name, department_id, salary, SUM(salary) OVER (ORDER BY department_id, name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum FROM employee ORDER BY department_id, name; No partition specified: the window frame grows +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Nils | NULL | 75000 | 75000 | | Dag | 10 | NULL | 75000 | | Erik | 10 | 100000 | 175000 | | Frederik | 10 | 60000 | 235000 | | Jon | 10 | 60000 | 295000 | | Michael | 10 | 70000 | 365000 | | Lena | 20 | 65000 | 430000 | | Nils | 20 | 80000 | 510000 | | Paula | 20 | 65000 | 575000 | | Rose | 30 | 300000 | 875000 | | William | 30 | 70000 | 945000 | +----------+---------------+--------+--------+ Cumulative sum
  • 18.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Cumulative sum SELECT name, department_id, salary, SUM(salary) OVER (ORDER BY department_id, name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum FROM employee ORDER BY department_id, name; No partition specified: the window frame grows +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Nils | NULL | 75000 | 75000 | | Dag | 10 | NULL | 75000 | | Erik | 10 | 100000 | 175000 | | Frederik | 10 | 60000 | 235000 | | Jon | 10 | 60000 | 295000 | | Michael | 10 | 70000 | 365000 | | Lena | 20 | 65000 | 430000 | | Nils | 20 | 80000 | 510000 | | Paula | 20 | 65000 | 575000 | | Rose | 30 | 300000 | 875000 | | William | 30 | 70000 | 945000 | +----------+---------------+--------+--------+
  • 19.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Cumulative sum SELECT name, department_id, salary, SUM(salary) OVER (ORDER BY department_id, name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum FROM employee ORDER BY department_id, name; No partition specified: the window frame grows +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Nils | NULL | 75000 | 75000 | | Dag | 10 | NULL | 75000 | | Erik | 10 | 100000 | 175000 | | Frederik | 10 | 60000 | 235000 | | Jon | 10 | 60000 | 295000 | | Michael | 10 | 70000 | 365000 | | Lena | 20 | 65000 | 430000 | | Nils | 20 | 80000 | 510000 | | Paula | 20 | 65000 | 575000 | | Rose | 30 | 300000 | 875000 | | William | 30 | 70000 | 945000 | +----------+---------------+--------+--------+
  • 20.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Cumulative sum SELECT name, department_id, salary, SUM(salary) OVER (ORDER BY department_id, name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum FROM employee ORDER BY department_id, name; No partition specified: the window frame grows +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Nils | NULL | 75000 | 75000 | | Dag | 10 | NULL | 75000 | | Erik | 10 | 100000 | 175000 | | Frederik | 10 | 60000 | 235000 | | Jon | 10 | 60000 | 295000 | | Michael | 10 | 70000 | 365000 | | Lena | 20 | 65000 | 430000 | | Nils | 20 | 80000 | 510000 | | Paula | 20 | 65000 | 575000 | | Rose | 30 | 300000 | 875000 | | William | 30 | 70000 | 945000 | +----------+---------------+--------+--------+
  • 21.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Cumulative sum SELECT name, department_id, salary, SUM(salary) OVER (ORDER BY department_id, name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum FROM employee ORDER BY department_id, name; No partition specified: the window frame grows +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Nils | NULL | 75000 | 75000 | | Dag | 10 | NULL | 75000 | | Erik | 10 | 100000 | 175000 | | Frederik | 10 | 60000 | 235000 | | Jon | 10 | 60000 | 295000 | | Michael | 10 | 70000 | 365000 | | Lena | 20 | 65000 | 430000 | | Nils | 20 | 80000 | 510000 | | Paula | 20 | 65000 | 575000 | | Rose | 30 | 300000 | 875000 | | William | 30 | 70000 | 945000 | +----------+---------------+--------+--------+
  • 22.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Cumulative sum, partitioned +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Nils | NULL | 75000 | 75000 | | Dag | 10 | NULL | NULL | | Erik | 10 | 100000 | 100000 | | Frederik | 10 | 60000 | 160000 | | Jon | 10 | 60000 | 220000 | | Michael | 10 | 70000 | 290000 | | Lena | 20 | 65000 | 65000 | | Nils | 20 | 80000 | 145000 | | Paula | 20 | 65000 | 210000 | | Rose | 30 | 300000 | 300000 | | William | 30 | 70000 | 370000 | +----------+---------------+--------+--------+ SELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum FROM employee ORDER BY department_id, name; No partition specified: the window frame grows New partition: reset sum
  • 23.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Cumulative sum, partitioned +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Nils | NULL | 75000 | 75000 | | Dag | 10 | NULL | NULL | | Erik | 10 | 100000 | 100000 | | Frederik | 10 | 60000 | 160000 | | Jon | 10 | 60000 | 220000 | | Michael | 10 | 70000 | 290000 | | Lena | 20 | 65000 | 65000 | | Nils | 20 | 80000 | 145000 | | Paula | 20 | 65000 | 210000 | | Rose | 30 | 300000 | 300000 | | William | 30 | 70000 | 370000 | +----------+---------------+--------+--------+ SELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum FROM employee ORDER BY department_id, name; No partition specified: the window frame grows
  • 24.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Cumulative sum, partitioned +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Nils | NULL | 75000 | 75000 | | Dag | 10 | NULL | NULL | | Erik | 10 | 100000 | 100000 | | Frederik | 10 | 60000 | 160000 | | Jon | 10 | 60000 | 220000 | | Michael | 10 | 70000 | 290000 | | Lena | 20 | 65000 | 65000 | | Nils | 20 | 80000 | 145000 | | Paula | 20 | 65000 | 210000 | | Rose | 30 | 300000 | 300000 | | William | 30 | 70000 | 370000 | +----------+---------------+--------+--------+ SELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum FROM employee ORDER BY department_id, name; No partition specified: the window frame grows
  • 25.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Parts of window function SELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum FROM employee ORDER BY department_id, name; Function call + OVER keyword signals a window function
  • 26.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Parts of window function SELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum FROM employee ORDER BY department_id, name; An optional partition specification: PARTITION BY <expression> {, <expression}* ● A partition divides up the result set in disjoint sets ● A window function does not see rows in partitions other than that of the current row for which it is being evaluated
  • 27.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Parts of window function SELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum FROM employee ORDER BY department_id, name; An optional ordering specification: ORDER BY <expression> {, <expression}* ● Orders the row within the partition ● Not the same as a final query ORDER BY and makes no guarantees of final query row ordering.
  • 28.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | ORDER BY: growing frame ● Some window functions need row ordering to be useful, e.g. RANK ● Peers: same value for ORDER BY expression(s) SELECT name, department_id, salary, SUM(salary) OVER (ORDER BY salary DESC) AS sum FROM employee; +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Rose | 30 | 300000 | 300000 | | Erik | 10 | 100000 | 400000 | | Nils | 20 | 80000 | 480000 | | Nils | NULL | 75000 | 555000 | | Michael | 10 | 70000 | 625000 | | William | 30 | 70000 | 695000 | | Lena | 20 | 65000 | 760000 | | Paula | 20 | 65000 | 825000 | | Frederik | 10 | 60000 | 885000 | | Jon | 10 | 60000 | 945000 | | Dag | 10 | NULL | 945000 | +----------+---------------+--------+--------+
  • 29.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | ORDER BY: growing frame ● Some window functions need row ordering to be useful, e.g. RANK ● Peers: same value for ORDER BY expression(s) SELECT name, department_id, salary, SUM(salary) OVER (ORDER BY salary DESC) AS sum FROM employee; +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Rose | 30 | 300000 | 300000 | | Erik | 10 | 100000 | 400000 | | Nils | 20 | 80000 | 480000 | | Nils | NULL | 75000 | 555000 | | Michael | 10 | 70000 | 695000 | | William | 30 | 70000 | 695000 | | Lena | 20 | 65000 | 825000 | | Paula | 20 | 65000 | 825000 | | Frederik | 10 | 60000 | 945000 | | Jon | 10 | 60000 | 945000 | | Dag | 10 | NULL | 945000 | +----------+---------------+--------+--------+
  • 30.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | ORDER BY: growing frame ● Some window functions need row ordering to be useful, e.g. RANK ● Peers: same value for ORDER BY expression(s) SELECT name, department_id, salary, SUM(salary) OVER (ORDER BY salary DESC) AS sum FROM employee; +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Rose | 30 | 300000 | 300000 | | Erik | 10 | 100000 | 400000 | | Nils | 20 | 80000 | 480000 | | Nils | NULL | 75000 | 555000 | | Michael | 10 | 70000 | 695000 | | William | 30 | 70000 | 695000 | | Lena | 20 | 65000 | 825000 | | Paula | 20 | 65000 | 825000 | | Frederik | 10 | 60000 | 945000 | | Jon | 10 | 60000 | 945000 | | Dag | 10 | NULL | 945000 | +----------+---------------+--------+--------+
  • 31.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | ORDER BY: growing frame ● Some window functions need row ordering to be useful, e.g. RANK ● Peers: same value for ORDER BY expression(s) SELECT name, department_id, salary, SUM(salary) OVER (ORDER BY salary DESC) AS sum FROM employee; +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Rose | 30 | 300000 | 300000 | | Erik | 10 | 100000 | 400000 | | Nils | 20 | 80000 | 480000 | | Nils | NULL | 75000 | 555000 | | Michael | 10 | 70000 | 695000 | | William | 30 | 70000 | 695000 | | Lena | 20 | 65000 | 825000 | | Paula | 20 | 65000 | 825000 | | Frederik | 10 | 60000 | 945000 | | Jon | 10 | 60000 | 945000 | | Dag | 10 | NULL | 945000 | +----------+---------------+--------+--------+
  • 32.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | ORDER BY: growing frame PEERS ● Some window functions need row ordering to be useful, e.g. RANK ● Peers: same value for ORDER BY expression(s) SELECT name, department_id, salary, SUM(salary) OVER (ORDER BY salary DESC) AS sum FROM employee; +----------+---------------+--------+--------+ | name | department_id | salary | sum | +----------+---------------+--------+--------+ | Rose | 30 | 300000 | 300000 | | Erik | 10 | 100000 | 400000 | | Nils | 20 | 80000 | 480000 | | Nils | NULL | 75000 | 555000 | | Michael | 10 | 70000 | 695000 | | William | 30 | 70000 | 695000 | | Lena | 20 | 65000 | 825000 | | Paula | 20 | 65000 | 825000 | | Frederik | 10 | 60000 | 945000 | | Jon | 10 | 60000 | 945000 | | Dag | 10 | NULL | 945000 | +----------+---------------+--------+--------+ What happened here? Answer: Two rows are peers w.r.t. salary This is an example of a RANGE frame (implicit)
  • 33.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Parts of window function SELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum FROM employee ORDER BY department_id, name; An optional frame specification ● A subset of rows within a partition ● Extent can depend on the current row ● Default frame: partition ● Not all window functions heed frame
  • 34.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | partition CURRENT ROW UNBOUNDED PRECEDING UNBOUNDED FOLLOWING n PRECEDING m FOLLOWING n: numeric or temporal Frame anatomy Examples: ● ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ● RANGE CURRENT ROW ● ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING ● ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING ● RANGE INTERVAL 6 DAY PRECEDING
  • 35.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | partition CURRENT ROW UNBOUNDED PRECEDING UNBOUNDED FOLLOWING n PRECEDING m FOLLOWING n: numeric or temporal Frame anatomy Examples: ● ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ● RANGE CURRENT ROW ● ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING ● ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING ● RANGE INTERVAL 6 DAY PRECEDING
  • 36.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | partition CURRENT ROW UNBOUNDED PRECEDING n: numeric or temporal Frame anatomy Examples: ● ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ● RANGE CURRENT ROW ● ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING ● ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING ● RANGE INTERVAL 6 DAY PRECEDING
  • 37.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | partition CURRENT ROW and peers Frame anatomy Examples: ● ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ● RANGE CURRENT ROW ● ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING ● ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING ● RANGE INTERVAL 6 DAY PRECEDING
  • 38.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | partition CURRENT ROW 3 FOLLOWING Frame anatomy Examples: ● ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ● RANGE CURRENT ROW ● ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING ● ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING ● RANGE INTERVAL 6 DAY PRECEDING
  • 39.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | partition CURRENT ROW 2 FOLLOWING Frame anatomy Examples: ● ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ● RANGE CURRENT ROW ● ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING ● ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING ● RANGE INTERVAL 6 DAY PRECEDING 2 PRECEDING
  • 40.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | ROWS vs. RANGE ● Frame boundaries: physical (ROWS) or logical (RANGE) ● ROWS: bound N: # rows. Peers are ignored. ● RANGE requires ORDER BY on a single numeric or temporal expression ● RANGE: bound N: rows with value for ascending ORDER BY expression within N lower (PRECEDING) and M higher (FOLLOWING) of value of the current row. Peers are always included in frame. Ex: ORDER BY date RANGE BETWEEN INTERVAL 6 DAY PRECEDING AND CURRENT ROW specifies all rows within last week. ● Default (std): OVER (ORDER BY n) == OVER (ORDER BY n RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
  • 41.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Determinacy SELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum FROM employee; In general, window queries are not deterministic unless one orders on enough expressions to designate the row uniquely. Minimum guarantee by SQL std: several equivalent non-deterministic orderings in same query give the same order (within query).
  • 42.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Determinacy SELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum FROM employee ORDER BY department_id, name; In general, window queries are not deterministic unless one orders on enough expressions to designate the row uniquely. Minimum guarantee by SQL std: several equivalent non-deterministic orderings in same query give the same order (within query). A final ORDER BY is still required if ordering is desired: no guarantees from window.
  • 43.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Example: salary analysis Question: find the employees with the largest difference between their wage and that of the department average SELECT name, department_id, salary, AVG(salary) OVER (PARTITION BY department_id) AS avg, salary - AVG(salary) OVER (PARTITION BY department_id) AS diff FROM employee ORDER BY diff desc; +----------+---------------+--------+-------------+--------------+ | name | department_id | salary | avg | diff | +----------+---------------+--------+-------------+--------------+ | Rose | 30 | 300000 | 185000.0000 | 115000.0000 | | Erik | 10 | 100000 | 72500.0000 | 27500.0000 | | Nils | 20 | 80000 | 70000.0000 | 10000.0000 | | Nils | NULL | 75000 | 75000.0000 | 0.0000 | | Michael | 10 | 70000 | 72500.0000 | -2500.0000 | | Lena | 20 | 65000 | 70000.0000 | -5000.0000 | | Paula | 20 | 65000 | 70000.0000 | -5000.0000 | | Frederik | 10 | 60000 | 72500.0000 | -12500.0000 | | Jon | 10 | 60000 | 72500.0000 | -12500.0000 | | William | 30 | 70000 | 185000.0000 | -115000.0000 | | Dag | 10 | NULL | 72500.0000 | NULL | +----------+---------------+--------+-------------+--------------+
  • 44.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Example: salary analysis Question: find the employees with the largest difference between their wage and that of the department average SELECT name, department_id, salary, AVG(salary) OVER (PARTITION BY department_id) AS avg, salary - AVG(salary) OVER (PARTITION BY department_id) AS diff FROM employee ORDER BY diff desc; ● Here: two distinct windows ● A query can use have any number of windows ● Logically evaluated in multiple phases
  • 45.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Example: named window Question: find the employees with the largest difference between their wage and that of the department average SELECT name, department_id, salary, AVG(salary) OVER w AS avg, salary - AVG(salary) OVER w AS diff FROM employee WINDOW w as (PARTITION BY department_id) ORDER BY diff desc; Named window w References to w ● Multiple window functions per window ● Will be evaluated in same phase (efficiency) ● Better readability
  • 46.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Ex: moving AVG, smoothing CREATE TABLE sales(id INT AUTO_INCREMENT PRIMARY KEY, date DATE, sale INT); ...; SELECT * FROM sales; +----+------------+------+ | id | date | sale | +----+------------+------+ | 1 | 2017-03-01 | 200 | | 2 | 2017-04-01 | 300 | | 3 | 2017-05-01 | 400 | | 4 | 2017-06-01 | 200 | | 5 | 2017-07-01 | 600 | | 6 | 2017-08-01 | 100 | | 7 | 2017-03-01 | 400 | | 8 | 2017-04-01 | 300 | | 9 | 2017-05-01 | 500 | | 10 | 2017-06-01 | 400 | | 11 | 2017-07-01 | 600 | | 12 | 2017-08-01 | 150 | +----+------------+------+
  • 47.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Ex: moving AVG, smoothing +-------------+-----------+ | MONTH(date) | SUM(sale) | +-------------+-----------+ | 3 | 600 | | 4 | 600 | | 5 | 900 | | 6 | 600 | | 7 | 1200 | | 8 | 250 | +-------------+-----------+ ● Sum up sales per month SELECT MONTH(date), SUM(sale) FROM sales GROUP BY MONTH(date);
  • 48.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Ex: moving AVG, smoothing ● Moving AVG over 3 months SELECT MONTH(date), SUM(sale), AVG(SUM(sale)) OVER w AS sliding_avg FROM sales GROUP BY MONTH(date) WINDOW w AS (ORDER BY MONTH(date) RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING); +-------------+-----------+-------------+ | MONTH(date) | SUM(sale) | sliding_avg | +-------------+-----------+-------------+ | 3 | 600 | 600.0000 | | 4 | 600 | | | 5 | 900 | | | 6 | 600 | | | 7 | 1200 | | | 8 | 250 | | +-------------+-----------+-------------+
  • 49.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Ex: moving AVG, smoothing +-------------+-----------+-------------+ | MONTH(date) | SUM(sale) | sliding_avg | +-------------+-----------+-------------+ | 3 | 600 | 600.0000 | | 4 | 600 | 700.0000 | | 5 | 900 | | | 6 | 600 | | | 7 | 1200 | | | 8 | 250 | | +-------------+-----------+-------------+ ● Moving AVG over 3 months SELECT MONTH(date), SUM(sale), AVG(SUM(sale)) OVER w AS sliding_avg FROM sales GROUP BY MONTH(date) WINDOW w AS (ORDER BY MONTH(date) RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING);
  • 50.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Ex: moving AVG, smoothing ● Moving AVG over 3 months SELECT MONTH(date), SUM(sale), AVG(SUM(sale)) OVER w AS sliding_avg FROM sales GROUP BY MONTH(date) WINDOW w AS (ORDER BY MONTH(date) RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING); +-------------+-----------+-------------+ | MONTH(date) | SUM(sale) | sliding_avg | +-------------+-----------+-------------+ | 3 | 600 | 600.0000 | | 4 | 600 | 700.0000 | | 5 | 900 | 700.0000 | | 6 | 600 | | | 7 | 1200 | | | 8 | 250 | | +-------------+-----------+-------------+ moving frame
  • 51.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Ex: moving AVG, smoothing ● Moving AVG over 3 months SELECT MONTH(date), SUM(sale), AVG(SUM(sale)) OVER w AS sliding_avg FROM sales GROUP BY MONTH(date) WINDOW w AS (ORDER BY MONTH(date) RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING); +-------------+-----------+-------------+ | MONTH(date) | SUM(sale) | sliding_avg | +-------------+-----------+-------------+ | 3 | 600 | 600.0000 | | 4 | 600 | 700.0000 | | 5 | 900 | 700.0000 | | 6 | 600 | 900.0000 | | 7 | 1200 | | | 8 | 250 | | +-------------+-----------+-------------+
  • 52.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Ex: moving AVG, smoothing ● Moving AVG over 3 months SELECT MONTH(date), SUM(sale), AVG(SUM(sale)) OVER w AS sliding_avg FROM sales GROUP BY MONTH(date) WINDOW w AS (ORDER BY MONTH(date) RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING); +-------------+-----------+-------------+ | MONTH(date) | SUM(sale) | sliding_avg | +-------------+-----------+-------------+ | 3 | 600 | 600.0000 | | 4 | 600 | 700.0000 | | 5 | 900 | 700.0000 | | 6 | 600 | 900.0000 | | 7 | 1200 | 683.3333 | | 8 | 250 | | +-------------+-----------+-------------+
  • 53.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Ex: moving AVG, smoothing ● Moving AVG over 3 months SELECT MONTH(date), SUM(sale), AVG(SUM(sale)) OVER w AS sliding_avg FROM sales GROUP BY MONTH(date) WINDOW w AS (ORDER BY MONTH(date) RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING); +-------------+-----------+-------------+ | MONTH(date) | SUM(sale) | sliding_avg | +-------------+-----------+-------------+ | 3 | 600 | 600.0000 | | 4 | 600 | 700.0000 | | 5 | 900 | 700.0000 | | 6 | 600 | 900.0000 | | 7 | 1200 | 683.3333 | | 8 | 250 | 725.0000 | +-------------+-----------+-------------+
  • 54.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Ex: moving AVG, smoothing ● Moving AVG over 3 months SELECT MONTH(date), SUM(sale), AVG(SUM(sale)) OVER w AS sliding_avg FROM sales GROUP BY MONTH(date) WINDOW w AS (ORDER BY MONTH(date) RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING); +-------------+-----------+-------------+ | MONTH(date) | SUM(sale) | sliding_avg | +-------------+-----------+-------------+ | 3 | 600 | 600.0000 | | 4 | 600 | 700.0000 | | 5 | 900 | 700.0000 | | 6 | 600 | 900.0000 | | 7 | 1200 | 683.3333 | | 8 | 250 | 725.0000 | +-------------+-----------+-------------+
  • 55.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Windowing in an SQL query JOIN GROUP BY, HAVING WINDOW 1 WINDOW n ORDER BY/ DISTINCT/ LIMIT ● Window functions see query result set after grouping/having - filtering on wf results requires subquery ● Ordering not semantically significant ● Window functions can't use window functions in same query (without using subqueries) ● In practice, ordering matters. The optimizer can is allowed to - reorder to minimize sorting required - merge window phases if equivalent
  • 56.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Program Agenda • Introduction: what & why • What's supported • Ranking and analytical wfs • Implementation & performance 2 4 3 4 1
  • 57.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | PART II in which we learn which window functions are supported by MySQL
  • 58.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | MySQL 8.0 ● Most aggregate functions in MySQL can be used as window functions: COUNT, SUM, AVG, MAX, MIN, STDDEV_POP (& synonyms), STDDEV_SAMP, VAR_POP (& synonym), VAR_SAMP Limitation: No DISTINCT in aggregates yet ● All SQL standard specialized window functions ROW_NUMBER, RANK, DENSE_RANK, PERCENT_RANK, CUME_DIST, NTILE, LEAD, LAG, FIRST_VALUE, LAST_VALUE, NTH_VALUE ● Next phase (probably post-GA), more aggregates: BIT_OR, BIT_XOR, BIT_AND, JSON_ARRAYAGG, JSON_OBJECTAGG [ GROUP_CONCAT ]
  • 59.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Std compliance: extensions Target SQL standard semantics, but ● Expression (not only column) allowed in PARTITION BY Benefit: more flexible ● Missing ORDER BY tolerated even if useless (all rows are peers) except for RANGE <value>: requires single ORDER BY expression ● Tolerate frame clause even for window functions that operate on entire partition (std is stricter) Benefit: Many wfs can use same window definition
  • 60.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Std compliance: restrictions Target SQL standard semantics, but ● Valued frame bounds must be static in query ● No GROUPS in frame clause (Feature T620) ● No EXCLUDE in frame clause ● No DISTINCT in aggregates with windowing ● IGNORE NULLS not supported ● FROM LAST not supported (NTH_VALUE) ● No nested window functions (Feature T619) ● No row pattern recognition in window clause (Feature R020) ● Operator subqueries with windowing only if materializable (WL#10431)
  • 61.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Program Agenda • Introduction: what & why • What's supported • Ranking and analytical wfs • Implementation & performance 3 4 2 4 1
  • 62.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | PART III in which we learn about the specialized window functions Ranking: ROW_NUMBER, RANK, DENSE_RANK, PERCENT_RANK, CUME_DIST, NTILE Analytical: LEAD, LAG, NTH_VALUE, FIRST_VALUE, LAST_VALUE Blue ones use frames, the others work on the entire partition.
  • 63.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | ROW_NUMBER ● Assign number to row in ascending order Example: give employees a number according to their salary SELECT name, department_id AS dept, salary, ROW_NUMBER() OVER w AS `row#` FROM employee WINDOW w AS (PARTITION BY department_id ORDER BY salary DESC, name ASC) ORDER BY department_id, `row#`; +----------+------+--------+------+ | name | dept | salary | row# | +----------+------+--------+------+ | Nils | NULL | 75000 | 1 | | Erik | 10 | 100000 | 1 | | Michael | 10 | 70000 | 2 | | Frederik | 10 | 60000 | 3 | | Jon | 10 | 60000 | 4 | | Dag | 10 | NULL | 5 | | Nils | 20 | 80000 | 1 | | Lena | 20 | 65000 | 2 | | Paula | 20 | 65000 | 3 | | Rose | 30 | 300000 | 1 | | William | 30 | 70000 | 2 | +----------+------+--------+------+
  • 64.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | RANK ● Rows that are the same w.r.t any ordering have the same rank Example: rank employees within each department according to their salary SELECT name, department_id AS dept, salary, .. , RANK() OVER w AS `rank` FROM employee WINDOW w AS (PARTITION BY department_id ORDER BY salary DESC) ORDER BY department_id, `row#`; +----------+------+--------+------+------+ | name | dept | salary | row# | rank | +----------+------+--------+------+------+ | Nils | NULL | 75000 | 1 | 1 | | Erik | 10 | 100000 | 1 | 1 | | Michael | 10 | 70000 | 2 | 2 | | Frederik | 10 | 60000 | 3 | 3 | | Jon | 10 | 60000 | 4 | 3 | | Dag | 10 | NULL | 5 | 5 | | Nils | 20 | 80000 | 1 | 1 | | Lena | 20 | 65000 | 2 | 2 | | Paula | 20 | 65000 | 3 | 2 | | Rose | 30 | 300000 | 1 | 1 | | William | 30 | 70000 | 2 | 2 | +----------+------+--------+------+------+ Peer rows w.r.t ordering, skip rank 4.
  • 65.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | DENSE_RANK ● Rows that are the same wrt any ordering have the same rank Example: rank employees within each department according to their salary SELECT name, department_id AS dept, salary, .. , DENSE_RANK() OVER w AS dense FROM employee WINDOW w AS (PARTITION BY department_id ORDER BY salary DESC) ORDER BY department_id, `row#`; +----------+------+--------+------+------+-------+ | name | dept | salary | row# | rank | dense | +----------+------+--------+------+------+-------+ | Nils | NULL | 75000 | 1 | 1 | 1 | | Erik | 10 | 100000 | 1 | 1 | 1 | | Michael | 10 | 70000 | 2 | 2 | 2 | | Frederik | 10 | 60000 | 3 | 3 | 3 | | Jon | 10 | 60000 | 4 | 3 | 3 | | Dag | 10 | NULL | 5 | 5 | 4 | | Nils | 20 | 80000 | 1 | 1 | 1 | | Lena | 20 | 65000 | 2 | 2 | 2 | | Paula | 20 | 65000 | 3 | 2 | 2 | | Rose | 30 | 300000 | 1 | 1 | 1 | | William | 30 | 70000 | 2 | 2 | 2 | +----------+------+--------+------+------+-------+ Peer rows w.r.t ordering, do not skip rank 4.
  • 66.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | PERCENT_RANK ● Relative rank: (rank - 1) / (total p.rows - 1), or 0 if one row in partition Example: rank employees within each department according to their salary SELECT name, department_id AS dept, salary, .. , PERCENT_RANK() OVER w AS `%rank` FROM employee WINDOW w AS (PARTITION BY department_id ORDER BY salary DESC) ORDER BY department_id, `row#`; +----------+------+--------+------+------+-------+-------+ | name | dept | salary | row# | rank | dense | %rank | +----------+------+--------+------+------+-------+-------+ | Nils | NULL | 75000 | 1 | 1 | 1 | 0 | | Erik | 10 | 100000 | 1 | 1 | 1 | 0 | | Michael | 10 | 70000 | 2 | 2 | 2 | 0.25 | | Frederik | 10 | 60000 | 3 | 3 | 3 | 0.5 | | Jon | 10 | 60000 | 4 | 3 | 3 | 0.5 | | Dag | 10 | NULL | 5 | 5 | 4 | 1 | | Nils | 20 | 80000 | 1 | 1 | 1 | 0 | | Lena | 20 | 65000 | 2 | 2 | 2 | 0.5 | | Paula | 20 | 65000 | 3 | 2 | 2 | 0.5 | | Rose | 30 | 300000 | 1 | 1 | 1 | 0 | | William | 30 | 70000 | 2 | 2 | 2 | 1 | +----------+------+--------+------+------+-------+-------+ (3-1)/(5-1)=0.5
  • 67.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | CUME_DIST ● Cumulative relative rank: preceding rows incl. peers / total p.rows ● Example: cumulative rank of employees within each department according to their salary SELECT name, department_id AS dept, salary, .. , CUME_DIST() OVER w AS cume FROM employee WINDOW w AS (PARTITION BY department_id ORDER BY salary DESC) ORDER BY department_id, `row#`; +----------+------+--------+------+------+-------+-------+---------+ | name | dept | salary | row# | rank | dense | %rank | cume | +----------+------+--------+------+------+-------+-------+---------+ | Nils | NULL | 75000 | 1 | 1 | 1 | 0 | 1 | | Erik | 10 | 100000 | 1 | 1 | 1 | 0 | 0.2 | | Michael | 10 | 70000 | 2 | 2 | 2 | 0.25 | 0.4 | | Frederik | 10 | 60000 | 3 | 3 | 3 | 0.5 | 0.8 | | Jon | 10 | 60000 | 4 | 3 | 3 | 0.5 | 0.8 | | Dag | 10 | NULL | 5 | 5 | 4 | 1 | 1 | | Nils | 20 | 80000 | 1 | 1 | 1 | 0 | 0.33333 | | Lena | 20 | 65000 | 2 | 2 | 2 | 0.5 | 1 | | Paula | 20 | 65000 | 3 | 2 | 2 | 0.5 | 1 | | Rose | 30 | 300000 | 1 | 1 | 1 | 0 | 0.5 | | William | 30 | 70000 | 2 | 2 | 2 | 1 | 1 | +----------+------+--------+------+------+-------+-------+---------+ 4/5=0.8
  • 68.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | NTILE ● Divides an ordered partition into a specified number of groups aka buckets as evenly as possible and assigns a bucket number to each row in the partition. In spite of name, not the same as percentile! SELECT name, department_id AS dept, salary, .. , NTILE(3) OVER w AS `3-tile` FROM employee WINDOW w AS (PARTITION BY department_id ORDER BY salary DESC) ORDER BY department_id, `row#`; +----------+------+--------+------+------+-------+-------+---------+--------+ | name | dept | salary | row# | rank | dense | %rank | cume | 3-tile | +----------+------+--------+------+------+-------+-------+---------+--------+ | Nils | NULL | 75000 | 1 | 1 | 1 | 0 | 1 | 1 | | Erik | 10 | 100000 | 1 | 1 | 1 | 0 | 0.2 | 1 | | Michael | 10 | 70000 | 2 | 2 | 2 | 0.25 | 0.4 | 1 | | Frederik | 10 | 60000 | 3 | 3 | 3 | 0.5 | 0.8 | 2 | | Jon | 10 | 60000 | 4 | 3 | 3 | 0.5 | 0.8 | 2 | | Dag | 10 | NULL | 5 | 5 | 4 | 1 | 1 | 3 | | Nils | 20 | 80000 | 1 | 1 | 1 | 0 | 0.33333 | 1 | | Lena | 20 | 65000 | 2 | 2 | 2 | 0.5 | 1 | 2 | | Paula | 20 | 65000 | 3 | 2 | 2 | 0.5 | 1 | 3 | | Rose | 30 | 300000 | 1 | 1 | 1 | 0 | 0.5 | 1 | | William | 30 | 70000 | 2 | 2 | 2 | 1 | 1 | 2 | +----------+------+--------+------+------+-------+-------+---------+--------+
  • 69.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | LEAD, LAG ● Returns value evaluated at the row that is offset rows after/before the current row within the partition; if there is no such row, instead return an optional default expression (which must be of the same type as value). ● Both offset and default expr are evaluated with respect to the current row. If omitted, offset defaults to 1 and default expr to null Syntax: LEAD( <expr> [, <offset> [, <default expr> ] ] ) [ <RESPECT NULLS> ] Example: LEAD(date) OVER (..) Note: “IGNORE NULLS” not supported, RESPECT NULLS is default but can be specified. ● Any window frame is ignored
  • 70.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | LEAD +----------+------+--------+--------+ | name | dept | salary | lead | +----------+------+--------+--------+ | Rose | 30 | 300000 | 100000 | | Erik | 10 | 100000 | | | Nils | 20 | 80000 | | | Nils | NULL | 75000 | | | Michael | 10 | 70000 | | | William | 30 | 70000 | : | | Lena | 20 | 65000 | | | Paula | 20 | 65000 | | | Frederik | 10 | 60000 | | | Jon | 10 | 60000 | | | Dag | 10 | NULL | | +----------+------+--------+--------+ SELECT name, department_id AS dept, salary, LEAD(salary, 1) OVER w AS `lead` FROM employee WINDOW w AS (ORDER BY salary DESC);
  • 71.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | LEAD +----------+------+--------+--------+ | name | dept | salary | lead | +----------+------+--------+--------+ | Rose | 30 | 300000 | 100000 | | Erik | 10 | 100000 | 80000 | | Nils | 20 | 80000 | | | Nils | NULL | 75000 | | | Michael | 10 | 70000 | | | William | 30 | 70000 | | | Lena | 20 | 65000 | | | Paula | 20 | 65000 | | | Frederik | 10 | 60000 | | | Jon | 10 | 60000 | | | Dag | 10 | NULL | | +----------+------+--------+--------+ SELECT name, department_id AS dept, salary, LEAD(salary, 1) OVER w AS `lead` FROM employee WINDOW w AS (ORDER BY salary DESC);
  • 72.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | LEAD +----------+------+--------+--------+ | name | dept | salary | lead | +----------+------+--------+--------+ | Rose | 30 | 300000 | 100000 | | Erik | 10 | 100000 | 80000 | | Nils | 20 | 80000 | 75000 | | Nils | NULL | 75000 | | | Michael | 10 | 70000 | | | William | 30 | 70000 | | | Lena | 20 | 65000 | | | Paula | 20 | 65000 | | | Frederik | 10 | 60000 | | | Jon | 10 | 60000 | | | Dag | 10 | NULL | | +----------+------+--------+--------+ SELECT name, department_id AS dept, salary, LEAD(salary, 1) OVER w AS `lead` FROM employee WINDOW w AS (ORDER BY salary DESC);
  • 73.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | LEAD +----------+------+--------+--------+ | name | dept | salary | lead | +----------+------+--------+--------+ | Rose | 30 | 300000 | 100000 | | Erik | 10 | 100000 | 80000 | | Nils | 20 | 80000 | 75000 | | Nils | NULL | 75000 | 70000 | | Michael | 10 | 70000 | 70000 | | William | 30 | 70000 | 65000 | | Lena | 20 | 65000 | 65000 | | Paula | 20 | 65000 | 60000 | | Frederik | 10 | 60000 | 60000 | | Jon | 10 | 60000 | NULL | | Dag | 10 | NULL | NULL | +----------+------+--------+--------+ SELECT name, department_id AS dept, salary, LEAD(salary, 1) OVER w AS `lead` FROM employee WINDOW w AS (ORDER BY salary DESC);
  • 74.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | LEAD +----------+------+--------+--------+ | name | dept | salary | lead | +----------+------+--------+--------+ | Rose | 30 | 300000 | 100000 | | Erik | 10 | 100000 | 80000 | | Nils | 20 | 80000 | 75000 | | Nils | NULL | 75000 | 70000 | | Michael | 10 | 70000 | 70000 | | William | 30 | 70000 | 65000 | | Lena | 20 | 65000 | 65000 | | Paula | 20 | 65000 | 60000 | | Frederik | 10 | 60000 | 60000 | | Jon | 10 | 60000 | NULL | | Dag | 10 | NULL | 77000 | +----------+------+--------+--------+ SELECT name, department_id AS dept, salary, LEAD(salary, 1, 77000) OVER w AS `lead` FROM employee WINDOW w AS (ORDER BY salary DESC); default
  • 75.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | LEAD - gap detection ● Classic example: CREATE TABLE t(i INT); INSERT INTO t VALUES (1), (2), (4), (5), (6), (8), (9), (10); SELECT i, l FROM (SELECT i, LEAD(i) OVER (ORDER BY i) AS l FROM t) d WHERE i + 1 <> l; +------+------+ | i | l | +------+------+ | 2 | 4 | | 6 | 8 | +------+------+
  • 76.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | FIRST_VALUE, LAST_VALUE Returns value evaluated at the first, last in the frame of the current row within the partition; if there is no nth row (frame is too small), the NTH_VALUE returns NULL. Note: “IGNORE NULLS” is not supported, RESPECT NULLS is default but can be specified.
  • 77.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | FIRST_VALUE ● Difference between employee wages and best paid in department SELECT name, department_id AS dept, salary, FIRST_VALUE(salary) OVER w - salary AS diff FROM employee WINDOW w AS (PARTITION BY department_id ORDER BY salary DESC) +----------+------+--------+--------+ | name | dept | salary | diff | +----------+------+--------+--------+ | Nils | NULL | 75000 | 0 | | Erik | 10 | 100000 | 0 | | Michael | 10 | 70000 | 30000 | | Frederik | 10 | 60000 | 40000 | | Jon | 10 | 60000 | 40000 | | Dag | 10 | NULL | NULL | | Nils | 20 | 80000 | 0 | | Lena | 20 | 65000 | 15000 | | Paula | 20 | 65000 | 15000 | | Rose | 30 | 300000 | 0 | | William | 30 | 70000 | 230000 | +----------+------+--------+--------+
  • 78.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | NTH_VALUE Returns value evaluated at the nth in the frame of the current row within the partition; if there is no nth row (frame is too small), the NTH_VALUE returns NULL. Note: “IGNORE NULLS” is not supported, RESPECT NULLS is used but can be specified. Note: For NTH_VALUE, “FROM LAST” is not supported, FROM FIRST is used but can be specified
  • 79.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Program Agenda • Introduction: what & why • What's supported • Ranking and analytical wfs • Implementation & performance4 2 3 1
  • 80.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | PART IV is where we look at implementation and performance aspects
  • 81.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Windowing in an SQL query JOIN GROUP BY, HAVING WINDOW 1 WINDOW n ORDER BY/ DISTINCT/ LIMIT ● Window functions see query result set after grouping/having - filtering on wf results requires subquery ● Ordering not semantically significant ● Window functions can't use window functions in same query (without using subqueries) ● In practice, ordering matters. The optimizer is allowed to - reorder to minimize sorting required - merge window phases if equivalent
  • 82.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | JOIN GROUP BY WINDOW 1 WINDOW n ORDER BY/ DISTINCT/ LIMIT Sort by concatenation of PARTITION BY and ORDER BY ● Tmp table between each windowing step ● Optimization: re-order windows to eliminate sorting steps: when equal PARTITION BY/ORDER BY expressions Processing window functions SELECT name, SUM(salary) OVER () FROM employee LIMIT 3 +----------+---------------------+ | name | SUM(salary) OVER () | +----------+---------------------+ | Dag | 945000 | | Erik | 945000 | | Frederik | 945000 | +----------+---------------------+ Need to read all rows to get SUM before we can output row 1: need buffering to merge original row with result of window function
  • 83.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | JOIN GROUP BY WINDOW 1 WINDOW n ORDER BY/ DISTINCT/ LIMIT Row addressable buffer aka frame buffer in-memory tmp table; overflows automatically to disk if needed Permits re-reading rows when frame moves Processing window functions
  • 84.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Row addressable buffer aka frame buffer Streaming window functions Frame buffer not needed for streaming window functions: ● ROW_NUMBER, RANK, DENSE_RANK ● Aggregates with ROW frame and dynamically growing upper bound, e.g. SELECT SUM(salary) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM employee; ● Non-streaming (need buffer), e.g. CUME_DIST, SUM() OVER () => a function on what window function and which frame specification If in doubt, check with EXPLAIN FORMAT=JSON
  • 85.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | EXPLAIN: streaming EXPLAIN format=json SELECT SUM(salary) OVER (ROWS UNBOUNDED PRECEDING) FROM employee;"query_block": { "select_id": 1, "cost_info": { "query_cost": "1.35" }, "windowing": { "windows": [ { "name": "<unnamed window>", "functions": [ "sum" ] } ], "table": { "table_name": "employee", "access_type": "ALL", "rows_examined_per_scan": 11, "rows_produced_per_join": 11, "filtered": "100.00", "cost_info": { "read_cost": "0.25", "eval_cost": "1.10", "prefix_cost": "1.35", "data_read_per_join": "1K" }, "used_columns": [ "salary" ] } } } }
  • 86.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | EXPLAIN: frame buffer EXPLAIN format=json SELECT COUNT(salary) OVER () FROM employee; "query_block": { "select_id": 1, "cost_info": { "query_cost": "1.35" }, "windowing": { "windows": [ { "name": "<unnamed window>", "frame_buffer": { "using_temporary_table": true, "optimized_frame_evaluation": true }, "functions": [ "count" ] } ], "table": { "table_name": "employee", "access_type": "ALL", "rows_examined_per_scan": 11, "rows_produced_per_join": 11, "filtered": "100.00", "cost_info": { "read_cost": "0.25", "eval_cost": "1.10", "prefix_cost": "1.35", "data_read_per_join": "1K" }, "used_columns": [ "salary" ] } } }
  • 87.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | JOIN GROUP BY WINDOW 1 WINDOW n ORDER BY/ DISTINCT/ LIMIT Row addressable buffer aka frame buffer in-memory; overflows automatically to disk if needed Permits re-reading rows when frame moves Frame buffer processing SELECT name, SUM(salary) OVER () FROM employee LIMIT 3 Optimization 1: Compute SUM only once (static frame) But what if frame changes? +----------+---------------------+ | name | SUM(salary) OVER () | +----------+---------------------+ | Dag | 945000 | | Erik | 945000 | | Frederik | 945000 | +----------+---------------------+
  • 88.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Frame buffer processing ● Case: expanding frame SELECT name, salary, SUM(salary) OVER (ORDER BY name) AS `sum` FROM employee LIMIT 3 Optimization 2: Remember SUM and adjust: here add next row's contribution: (NULL+100k)+60k=160k But what if we have a moving frame? +----------+--------+--------+ | name | salary | sum | +----------+--------+--------+ | Dag | NULL | NULL | | Erik | 100000 | 100000 | | Frederik | 60000 | 160000 | +----------+--------+--------+
  • 89.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Inversion ● Case: moving frame: Sales over this month and last SELECT MONTH(date) AS month, SUM(sale) AS monthly, SUM(SUM(sale)) OVER (ORDER BY MONTH(date) RANGE 1 PRECEDING) AS `this&last` FROM sales GROUP BY MONTH(date); Optimization 3: Remember SUM and adjust: here remove last contribution from 2 PRECEDING, then add current row's contribution: inversion NOTE: Ok only if: a + b - a + c = b + c This isn't always the case for aggregates Penalty if we can't do this for large window frames: O(#partition size) vs. O(#partition size * #frame size) +-------+---------+-----------+ | month | monthly | this&last | +-------+---------+-----------+ | 3 | 600 | 600 | | 4 | 600 | 1200 | | 5 | 900 | 1500 | | 6 | 600 | 1500 | | 7 | 1200 | 1800 | | 8 | 250 | 1450 | +-------+---------+-----------+
  • 90.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Floating aggregates, standard deviation and variance aggregates mysql> show variables like '%high%'; +------------------------------+-------+ | Variable_name | Value | +------------------------------+-------+ | windowing_use_high_precision | ON | +------------------------------+-------+ SET windowing_use_high_precision= off; For variance, the differences are only in the last significant few digits to the the incremental algorithm yielding slightly different results (usually insignificant) For floats, this can matter of if summing very large and small numbers: mysql> select 1.7976931348623157E+307 + 1 - 1.7976931348623157E+307; +-------------------------------------------------------+ | 1.7976931348623157E+307 + 1 - 1.7976931348623157E+307 | +-------------------------------------------------------+ | 0 | +-------------------------------------------------------+ Conditional inversion When: a + b - a + c ≠ b + c ● recomputes: slow, but guaranteed same results as grouped aggregates ● linear performance
  • 91.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Performance hints ● Use named windows to eliminate several windowing steps whenever possible Possibility: analyze and collapse windows where semantics allow it. We might want to add this capability to the optimizer if large demand. ● Streaming window functions faster than those that need buffering ● MAX/MIN do not support inversion, so can slow with large frames unless the expression is a prefix of the ORDER BY expressions ● JSON_OBJECTAGG is not invertible, so can slow with large frames
  • 92.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Q&A Thank you for using MySQL! Blog: http://mysqlserverteam.com/mysql-8-0-2-introducing-window-functions/
  • 93.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
  • 94.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. |
  • 95.
    Copyright © 2017,Oracle and/or its affiliates. All rights reserved. |