Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Window functions

315 views

Published on

In 2017 Oracle released the window function, a calculation across a set of table rows that are somehow related to the current row.

Published in: Data & Analytics
  • Be the first to comment

Window functions

  1. 1. Meetup PUG 30 Aprile 2019 #AperiTech
  2. 2. $ docker pull mysql $ docker start some-mysql $ docker run -it --link some-mysql:mysql --rm mysql sh -c 'exec mysql -h"$MYSQL_PORT_3306_TCP_ADDR" -P"$MYSQL_PORT_3306_TCP_PORT" -uroot -p"$MYSQL_ENV_MYSQL_ROOT_PASSWORD"' mysql> CREATE DATABASE wf; mysql> USE wf; mysql> CREATE TABLE gain (hotel VARCHAR(10), date DATE, sale INT);
  3. 3. mysql> INSERT INTO gain VALUES ('Cavalieri', '2019-03-01', 269), ('J.K.Place', '2019-02-01', 450), ('Manfredi', '2019-01-01', 400), ('Cavalieri', '2019-02-01', 262), ('Cavalieri', '2019-01-01', 250), ('Manfredi', '2019-02-01', 319), ('J.K.Place', '2019-03-01', 475), ('J.K.Place', '2019-01-01', 460), ('Manfredi', '2019-03-01', 341); ΔΔ
  4. 4. $mysqli = new mysqli('127.0.0.1', 'your_user', 'your_pass', 'wf'); $sql = "SELECT * FROM gain ORDER BY hotel, date"; $result = $mysqli->query($sql); $total = 0; $partial = []; $currentHotel = ''; $previousSale = 0; while ($hotel = $result->fetch_assoc()) { $total += $hotel['sale']; $partial[$hotel['name']] = $hotel['sale'] + $partial[$hotel['name']] ?? 0; $delta[$hotel['name']][$hotel['date']] = ($currentHotel != $hotel['name']) ? 0 : $hotel['sale'] - $previousSale; $previousSale = $hotel['sale']; $currentHotel = $hotel['name']; $gain[] = $hotel; }
  5. 5. $mysqli = new mysqli('127.0.0.1', 'your_user', 'your_pass', 'wf'); $sql = "SELECT * FROM gain ORDER BY hotel, date"; $result = $mysqli->query($sql); $total = 0; $partial = []; $currentHotel = ''; $previousSale = 0; while ($hotel = $result->fetch_assoc()) { $total += $hotel['sale']; $partial[$hotel['name']] = $hotel['sale'] + $partial[$hotel['name']] ?? 0; $delta[$hotel['name']][$hotel['date']] = ($currentHotel != $hotel['name']) ? 0 : $hotel['sale'] - $previousSale; $previousSale = $hotel['sale']; $currentHotel = $hotel['name']; $gain[] = $hotel; }
  6. 6. Window Function One of the biggest “news” of MySQL 8 Meetup PUG 30 Aprile 2019 #AperiTech
  7. 7. Index ● Window function history ● What it is ● Types of window function ● Logical flow ● Optimization
  8. 8. Who am I? I’m Davide Dell’Erba Full Stack Web Developer @ @delda80 github.com/delda info@davidedellerba.it
  9. 9. Window function story Window and window function were first introduced to SQL:1999 as amendment. Window functions were incorporated in SQL:2003 version of Standar SQL. They were updated in the next version SQL:2008. The last expansion was in the last version of the standard: SQL:2016.
  10. 10. Window function story Along the years, almost all major database systems introduced this feature: 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
  11. 11. What is a “window”? It is a set of rows defined by OVER() clause Set one Set two Set three ORDER BY() Set one Set two Set three PARTITION BY()
  12. 12. What is a “function”? Function Example Ranking ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE() Aggregate MIN(), MAX(), AVG(), SUM(), COUNT(), STDEV(), STDEVP(), VAR(), VARP(), CHECKSUM_AGG(), COUNT_BIG() Analytic CUME_DIST(), LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE(), PERCENT_RAIN()
  13. 13. So… What is a “window function”? ● It is a function that computes a value for a row using a window ● Operates on a window, witch is a group of related rows. ● Returns a value for each row of the table. ● The return value is calculated using data from the rows of window. ● This is a new concept: you can reach outside the current row.
  14. 14. GROUP BY() vs OVER() SELECT key, SUM(values) FROM table GROUP BY key; SELECT key, SUM(value) OVER (PARTITION BY key ORDER BY date) FROM table;
  15. 15. Getting OVER() mysql> SELECT hotel, date, sale, SUM(sale) OVER() AS total FROM gain; +-----------+------------+------+-------+ | hotel | date | sale | total | +-----------+------------+------+-------+ | Cavalieri | 2019-01-01 | 250 | 3226 | | Cavalieri | 2019-02-01 | 262 | 3226 | | Cavalieri | 2019-03-01 | 269 | 3226 | | J.K.Place | 2019-01-01 | 460 | 3226 | | J.K.Place | 2019-02-01 | 450 | 3226 | | J.K.Place | 2019-03-01 | 475 | 3226 | | Manfredi | 2019-01-01 | 400 | 3226 | | Manfredi | 2019-02-01 | 319 | 3226 | | Manfredi | 2019-03-01 | 341 | 3226 | +-----------+------------+------+-------+ SELECT SUM(sale) AS total FROM gain; +-------+ | total | +-------+ | 3226 | +-------+
  16. 16. Partition clause mysql> SELECT hotel, date, sale, SUM(sale) OVER ( PARTITION BY hotel) AS total FROM gain ORDER BY hotel, date; +-----------+------------+------+-------+ | hotel | date | sale | total | +-----------+------------+------+-------+ | Cavalieri | 2019-01-01 | 250 | 781 | | Cavalieri | 2019-02-01 | 262 | 781 | | Cavalieri | 2019-03-01 | 269 | 781 | | J.K.Place | 2019-01-01 | 460 | 1385 | | J.K.Place | 2019-02-01 | 450 | 1385 | | J.K.Place | 2019-03-01 | 475 | 1385 | | Manfredi | 2019-01-01 | 400 | 1060 | | Manfredi | 2019-02-01 | 319 | 1060 | | Manfredi | 2019-03-01 | 341 | 1060 | +-----------+------------+------+-------+ SELECT hotel, SUM(sale) FROM gain GROUP BY hotel; +-----------+-----------+ | hotel | SUM(sale) | +-----------+-----------+ | Cavalieri | 781 | | J.K.Place | 1385 | | Manfredi | 1060 | +-----------+-----------+
  17. 17. Types of window function: partition clause partition clause ● specified PARTITION BY clause ● windows are separated by partition boundary ● is supported by all window functions mysql> SELECT hotel, date, sale, SUM(sale) OVER ( PARTITION BY hotel) AS total FROM gain ORDER BY hotel, date;
  18. 18. Order by clause mysql> SELECT hotel, date, sale, SUM(sale) OVER (PARTITION BY hotel ORDER BY date) AS partial, SUM(sale) OVER (PARTITION BY hotel) AS total FROM gain; +-----------+------------+------+---------+-------+ | hotel | date | sale | partial | total | +-----------+------------+------+---------+-------+ | Cavalieri | 2019-01-01 | 250 | 250 | 781 | | Cavalieri | 2019-02-01 | 262 | 512 | 781 | | Cavalieri | 2019-03-01 | 269 | 781 | 781 | | J.K.Place | 2019-01-01 | 460 | 460 | 1385 | | J.K.Place | 2019-02-01 | 450 | 910 | 1385 | | J.K.Place | 2019-03-01 | 475 | 1385 | 1385 | | Manfredi | 2019-01-01 | 400 | 400 | 1060 | | Manfredi | 2019-02-01 | 319 | 719 | 1060 | | Manfredi | 2019-03-01 | 341 | 1060 | 1060 | +-----------+------------+------+---------+-------+
  19. 19. Types of window function: order by clause order by clause ● specified by ORDER BY clause ● defines ordering on set ● is supported by all window functions mysql> SELECT hotel, date, sale, SUM(sale) OVER (PARTITION BY hotel ORDER BY date) AS partial, SUM(sale) OVER (PARTITION BY hotel) AS total FROM gain;
  20. 20. Queries... mysql> SELECT hotel, date, sale, ROUND(AVG(sale) OVER (PARTITION BY hotel ORDER BY date), 2) AS average , SUM(sale) OVER (PARTITION BY hotel ORDER BY date) AS partial, SUM(sale) OVER (PARTITION BY hotel) AS total FROM gain; +-----------+------------+------+---------+---------+-------+ | hotel | date | sale | average | partial | total | +-----------+------------+------+---------+---------+-------+ | Cavalieri | 2019-01-01 | 250 | 250.00 | 250 | 781 | | Cavalieri | 2019-02-01 | 262 | 256.00 | 512 | 781 | | Cavalieri | 2019-03-01 | 269 | 260.33 | 781 | 781 | | J.K.Place | 2019-01-01 | 460 | 460.00 | 460 | 1385 | | J.K.Place | 2019-02-01 | 450 | 455.00 | 910 | 1385 | | J.K.Place | 2019-03-01 | 475 | 461.67 | 1385 | 1385 | | Manfredi | 2019-01-01 | 400 | 400.00 | 400 | 1060 | | Manfredi | 2019-02-01 | 319 | 359.50 | 719 | 1060 | | Manfredi | 2019-03-01 | 341 | 353.33 | 1060 | 1060 | +-----------+------------+------+---------+---------+-------+
  21. 21. Queries... mysql> SELECT hotel, date, sale, ROUND(AVG(sale) OVER (PARTITION BY hotel ORDER BY date), 2) AS average, ROUND(sale - AVG(sale) OVER (PARTITION BY hotel ORDER BY date), 2) AS delta , SUM(sale) OVER (PARTITION by hotel ORDER BY date) AS partial, SUM(sale) OVER (PARTITION BY hotel) AS total FROM gain; +-----------+------------+------+---------+--------+---------+-------+ | hotel | date | sale | average | delta | partial | total | +-----------+------------+------+---------+--------+---------+-------+ | Cavalieri | 2019-01-01 | 250 | 250.00 | 0.00 | 250 | 781 | | Cavalieri | 2019-02-01 | 262 | 256.00 | 6.00 | 512 | 781 | | Cavalieri | 2019-03-01 | 269 | 260.33 | 8.67 | 781 | 781 | | J.K.Place | 2019-01-01 | 460 | 460.00 | 0.00 | 460 | 1385 | | J.K.Place | 2019-02-01 | 450 | 455.00 | -5.00 | 910 | 1385 | | J.K.Place | 2019-03-01 | 475 | 461.67 | 13.33 | 1385 | 1385 | | Manfredi | 2019-01-01 | 400 | 400.00 | 0.00 | 400 | 1060 | | Manfredi | 2019-02-01 | 319 | 359.50 | -40.50 | 719 | 1060 | | Manfredi | 2019-03-01 | 341 | 353.33 | -12.33 | 1060 | 1060 | +-----------+------------+------+---------+--------+---------+-------+
  22. 22. Update table mysql> UPDATE gain SET sale = 400 WHERE hotel = 'Manfredi'; mysql> UPDATE gain SET sale = 450 WHERE date = '2019-02-01'; mysql> select * from gain; +-----------+------------+------+ | hotel | date | sale | +-----------+------------+------+ | Cavalieri | 2019-03-01 | 269 | | J.K.Place | 2019-02-01 | 450 | | Manfredi | 2019-01-01 | 400 | | Cavalieri | 2019-02-01 | 450 | | Cavalieri | 2019-01-01 | 250 | | Manfredi | 2019-02-01 | 450 | | J.K.Place | 2019-03-01 | 475 | | J.K.Place | 2019-01-01 | 460 | | Manfredi | 2019-03-01 | 400 | +-----------+------------+------+
  23. 23. Frame clause introduction mysql> SELECT hotel, date, sale, COUNT(sale) OVER ( ORDER BY sale DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS 'order', rank() OVER ( ORDER BY sale DESC) AS ranking, dense_rank() OVER ( ORDER BY sale DESC ) AS 'dense rank' FROM gain; +-----------+------------+------+-------+---------+------------+ | hotel | date | sale | order | ranking | dense rank | +-----------+------------+------+-------+---------+------------+ | J.K.Place | 2019-03-01 | 475 | 1 | 1 | 1 | | J.K.Place | 2019-01-01 | 460 | 2 | 2 | 2 | | J.K.Place | 2019-02-01 | 450 | 3 | 3 | 3 | | Cavalieri | 2019-02-01 | 450 | 4 | 3 | 3 | | Manfredi | 2019-02-01 | 450 | 5 | 3 | 3 | | Manfredi | 2019-01-01 | 400 | 6 | 6 | 4 | | Manfredi | 2019-03-01 | 400 | 7 | 6 | 4 | | Cavalieri | 2019-03-01 | 269 | 8 | 8 | 5 | | Cavalieri | 2019-01-01 | 250 | 9 | 9 | 6 | +-----------+------------+------+-------+---------+------------+
  24. 24. Frame clause mysql> SELECT hotel, sale, COUNT(sale) OVER (ORDER BY sale ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS rows , COUNT(sale) OVER (ORDER BY sale RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS range FROM gain; +-----------+------+------+-------+ | hotel | sale | rows | range | +-----------+------+------+-------+ | Cavalieri | 250 | 1 | 1 | | Cavalieri | 269 | 2 | 2 | | Manfredi | 400 | 3 | 4 | | Manfredi | 400 | 4 | 4 | | J.K.Place | 450 | 5 | 7 | | Cavalieri | 450 | 6 | 7 | | Manfredi | 450 | 7 | 7 | | J.K.Place | 460 | 8 | 8 | | J.K.Place | 475 | 9 | 9 | +-----------+------+------+-------+
  25. 25. Frame clause mysql> SELECT hotel, sale, COUNT(sale) OVER (ORDER BY sale ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'rows', COUNT(sale) OVER (ORDER BY sale ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS 'rows2', COUNT(sale) OVER (ORDER BY sale RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'range' FROM gain; +-----------+------+------+-------+-------+ | hotel | sale | rows | rows2 | range | +-----------+------+------+-------+-------+ | Cavalieri | 250 | 1 | 2 | 1 | | Cavalieri | 269 | 2 | 3 | 2 | | Manfredi | 400 | 3 | 3 | 4 | | Manfredi | 400 | 4 | 3 | 4 | | J.K.Place | 450 | 5 | 3 | 7 | | Cavalieri | 450 | 6 | 3 | 7 | | Manfredi | 450 | 7 | 3 | 7 | | J.K.Place | 460 | 8 | 3 | 8 | | J.K.Place | 475 | 9 | 2 | 9 | +-----------+------+------+-------+-------+
  26. 26. Frame clause frame clause ● specified respect the current row ● allow to tell how far the set is applied ● relationships between raw and frame are ROWS and RANGE mysql> SELECT hotel, sale, COUNT(sale) OVER (ORDER BY sale ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'rows', COUNT(sale) OVER (ORDER BY sale ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS 'rows2', COUNT(sale) OVER (ORDER BY sale RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'range' FROM gain;
  27. 27. Order please Order Clause Function 1 FROM (including JOINs) Choose and join tables 2 WHERE Filters the base data 3 GROUP BY Aggregate the base data 4 HAVING Filters the aggregate data 5 WINDOW FUNCTION Performing calculation on subset of data 6 SELECT Returns the final data 9 ORDER BY Sort the finale data 10 LIMIT + OFFSET Limits the returned data to a row count
  28. 28. Order please mysql> SELECT hotel, date, sale, RANK() OVER (ORDER BY hotel, date) AS 'rank' FROM gain; +-----------+------------+------+------+ | hotel | date | sale | rank | +-----------+------------+------+------+ | Cavalieri | 2019-01-01 | 250 | 1 | | Cavalieri | 2019-02-01 | 450 | 2 | | Cavalieri | 2019-03-01 | 269 | 3 | | J.K.Place | 2019-01-01 | 460 | 4 | | J.K.Place | 2019-02-01 | 450 | 5 | | J.K.Place | 2019-03-01 | 475 | 6 | | Manfredi | 2019-01-01 | 400 | 7 | | Manfredi | 2019-02-01 | 450 | 8 | | Manfredi | 2019-03-01 | 400 | 9 | +-----------+------------+------+------+
  29. 29. Order please mysql> SELECT hotel, date, sale, RANK() OVER (ORDER BY hotel, date) AS 'rank' FROM gain ORDER BY date; +-----------+------------+------+------+ | hotel | date | sale | rank | +-----------+------------+------+------+ | Manfredi | 2019-01-01 | 400 | 7 | | Cavalieri | 2019-01-01 | 250 | 1 | | J.K.Place | 2019-01-01 | 460 | 4 | | Manfredi | 2019-02-01 | 450 | 8 | | Cavalieri | 2019-02-01 | 450 | 2 | | J.K.Place | 2019-02-01 | 450 | 5 | | J.K.Place | 2019-03-01 | 475 | 6 | | Manfredi | 2019-03-01 | 400 | 9 | | Cavalieri | 2019-03-01 | 269 | 3 | +-----------+------------+------+------+
  30. 30. Implicit and explicit window function ● Window can be implicit and unnamed: SELECT SUM(sale) OVER (PARTITION BY sale) FROM gain; ● Window can be named via WINDOW clause: SELECT SUM(sale) OVER (wf) FROM gain WINDOW wf OVER (PARTITION BY sale);
  31. 31. Implicit and explicit window function ● A window can inherit from another window adding details SELECT hotel, date, sale, SUM(sale) OVER (wf2) AS partial, SUM(sale) OVER (wf1) AS total FROM gain WINDOW wf1 AS (PARTITION BY hotel), wf2 AS (wf1 ORDER BY sale);
  32. 32. EXPLAIN WITH JSON Using the simple EXPLAIN command, you can’t see the window function’s performance; alternatively, if you digit EXPLAIN FORMAT=JSON, you can know how to optimize the subquery.
  33. 33. EXPLAIN mysql> EXPLAIN SELECT hotel, date, sale, SUM(sale) OVER() total FROM gainG *************************** 1. row *************************** id: 1 select_type: SIMPLE table: gain partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 9 filtered: 100.00 Extra: NULL
  34. 34. EXPLAIN FORMAT=JSON EXPLAIN FORMAT=JSON SELECT hotel, date, sale, SUM(sale) OVER() total FROM gain; … "windows": [ { "name": "<unnamed window>", "frame_buffer": { "using_temporary_table": true, "optimized_frame_evaluation": true }, "functions": [ "sum" ]}], …
  35. 35. Q & A

×