Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

M|18 User Defined Function

203 views

Published on

M|18 User Defined Function

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

M|18 User Defined Function

  1. 1. © 2018 MariaDB Foundation * * 10.3 Window Functions and Custom User-Defined Functions Vicențiu Ciorbaru Software Engineer @ MariaDB Foundation Varun Gupta Software Engineer @ MariaDB Corporation
  2. 2. © 2018 MariaDB Foundation What can window functions do? Long Table ■ Can access multiple rows from the current row.
  3. 3. © 2018 MariaDB Foundation What can window functions do? Long Table ■ Can access multiple rows from the current row. ■ Eliminate self-joins.
  4. 4. © 2018 MariaDB Foundation What can window functions do? Long Table ■ Can access multiple rows from the current row. ■ Eliminate self-joins. ■ Convert queries that take 10 hours to only take 1 minute.
  5. 5. © 2018 MariaDB Foundation What are window functions? ■ Similar to aggregate functions ○ Computed over a sequence of rows ■ But they provide one result per row ○ Like regular functions! ■ Identified by the OVER clause.
  6. 6. © 2018 MariaDB Foundation What are window functions? SELECT email, first_name, last_name, account_type FROM users ORDER BY email; Let’s start with a “function like” example +------------------------+------------+-----------+--------------+ | email | first_name | last_name | account_type | +------------------------+------------+-----------+--------------+ | admin@boss.org | Admin | Boss | admin | | bob.carlsen@foo.bar | Bob | Carlsen | regular | | eddie.stevens@data.org | Eddie | Stevens | regular | | john.smith@xyz.org | John | Smith | regular | | root@boss.org | Root | Chief | admin | +------------------------+------------+-----------+--------------+
  7. 7. © 2018 MariaDB Foundation What are window functions? SELECT row_number() over () as rnum, email, first_name, last_name, account_type FROM users ORDER BY email; Let’s start with a “function like” example +------+------------------------+------------+-----------+--------------+ | rnum | email | first_name | last_name | account_type | +------+------------------------+------------+-----------+--------------+ | 1 | admin@boss.org | Admin | Boss | admin | | 2 | bob.carlsen@foo.bar | Bob | Carlsen | regular | | 3 | eddie.stevens@data.org | Eddie | Stevens | regular | | 4 | john.smith@xyz.org | John | Smith | regular | | 5 | root@boss.org | Root | Chief | admin | +------+------------------------+------------+-----------+--------------+
  8. 8. © 2018 MariaDB Foundation What are window functions? SELECT row_number() over () as rnum, email, first_name, last_name, account_type FROM users ORDER BY email; Let’s start with a “function like” example +------+------------------------+------------+-----------+--------------+ | rnum | email | first_name | last_name | account_type | +------+------------------------+------------+-----------+--------------+ | 1 | admin@boss.org | Admin | Boss | admin | | 2 | bob.carlsen@foo.bar | Bob | Carlsen | regular | | 3 | eddie.stevens@data.org | Eddie | Stevens | regular | | 4 | john.smith@xyz.org | John | Smith | regular | | 5 | root@boss.org | Root | Chief | admin | +------+------------------------+------------+-----------+--------------+ This order is not deterministic!
  9. 9. © 2018 MariaDB Foundation What are window functions? SELECT row_number() over () as rnum, email, first_name, last_name, account_type FROM users ORDER BY email; Let’s start with a “function like” example +------+------------------------+------------+-----------+--------------+ | rnum | email | first_name | last_name | account_type | +------+------------------------+------------+-----------+--------------+ | 2 | admin@boss.org | Admin | Boss | admin | | 1 | bob.carlsen@foo.bar | Bob | Carlsen | regular | | 3 | eddie.stevens@data.org | Eddie | Stevens | regular | | 5 | john.smith@xyz.org | John | Smith | regular | | 4 | root@boss.org | Root | Chief | admin | +------+------------------------+------------+-----------+--------------+ This is also valid!
  10. 10. © 2018 MariaDB Foundation What are window functions? SELECT row_number() over () as rnum, email, first_name, last_name, account_type FROM users ORDER BY email; Let’s start with a “function like” example +------+------------------------+------------+-----------+--------------+ | rnum | email | first_name | last_name | account_type | +------+------------------------+------------+-----------+--------------+ | 5 | admin@boss.org | Admin | Boss | admin | | 4 | bob.carlsen@foo.bar | Bob | Carlsen | regular | | 3 | eddie.stevens@data.org | Eddie | Stevens | regular | | 2 | john.smith@xyz.org | John | Smith | regular | | 1 | root@boss.org | Root | Chief | admin | +------+------------------------+------------+-----------+--------------+ And this one...
  11. 11. © 2018 MariaDB Foundation What are window functions? SELECT row_number() over (ORDER BY email) as rnum, email, first_name, last_name, account_type FROM users ORDER BY email; Let’s start with a “function like” example +------+------------------------+------------+-----------+--------------+ | rnum | email | first_name | last_name | account_type | +------+------------------------+------------+-----------+--------------+ | 1 | admin@boss.org | Admin | Boss | admin | | 2 | bob.carlsen@foo.bar | Bob | Carlsen | regular | | 3 | eddie.stevens@data.org | Eddie | Stevens | regular | | 4 | john.smith@xyz.org | John | Smith | regular | | 5 | root@boss.org | Root | Chief | admin | +------+------------------------+------------+-----------+--------------+ Now only this one is valid!
  12. 12. © 2018 MariaDB Foundation What are window functions? SELECT row_number() over (ORDER BY email) as rnum, email, first_name, last_name, account_type FROM users ORDER BY email; Let’s start with a “function like” example +------+------------------------+------------+-----------+--------------+ | rnum | email | first_name | last_name | account_type | +------+------------------------+------------+-----------+--------------+ | 1 | admin@boss.org | Admin | Boss | admin | | 2 | bob.carlsen@foo.bar | Bob | Carlsen | regular | | 3 | eddie.stevens@data.org | Eddie | Stevens | regular | | 4 | john.smith@xyz.org | John | Smith | regular | | 5 | root@boss.org | Root | Chief | admin | +------+------------------------+------------+-----------+--------------+ How do we “group” by account type?
  13. 13. © 2018 MariaDB Foundation What are window functions? SELECT row_number() over (PARTITION BY account_type ORDER BY email) as rnum, email, first_name, last_name, account_type FROM users ORDER BY account_type, email; Let’s start with a “function like” example +------+------------------------+------------+-----------+--------------+ | rnum | email | first_name | last_name | account_type | +------+------------------------+------------+-----------+--------------+ | 1 | admin@boss.org | Admin | Boss | admin | | 2 | root@boss.org | Root | Chief | admin | | 1 | bob.carlsen@foo.bar | Bob | Carlsen | regular | | 2 | eddie.stevens@data.org | Eddie | Stevens | regular | | 3 | john.smith@xyz.org | John | Smith | regular | +------+------------------------+------------+-----------+--------------+ row_number() resets for every partition
  14. 14. © 2018 MariaDB Foundation What are window functions? SELECT time, value FROM data_points ORDER BY time; How about that aggregate similarity?
  15. 15. © 2018 MariaDB Foundation What are window functions? SELECT time, value FROM data_points ORDER BY time; How about that aggregate similarity? SELECT time, value avg(value) over (ORDER BY time ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING), FROM data_points ORDER BY time;
  16. 16. © 2018 MariaDB Foundation What are window functions? SELECT time, value FROM data_points ORDER BY time; How about that aggregate similarity? SELECT time, value avg(value) over (ORDER BY time ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING), FROM data_points ORDER BY time;
  17. 17. © 2018 MariaDB Foundation What are window functions? SELECT time, value FROM data_points ORDER BY time; How about that aggregate similarity? SELECT time, value avg(value) over (ORDER BY time ROWS BETWEEN 6 PRECEDING AND 6 FOLLOWING), FROM data_points ORDER BY time;
  18. 18. © 2018 MariaDB Foundation What are window functions? SELECT time, value sum(value) OVER ( ORDER BY time ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM data_points ORDER BY time; +----------+-------+------+ | time | value | sum | +----------+-------+------+ | 10:00:00 | 2 | | | 11:00:00 | 5 | | | 12:00:00 | 4 | | | 13:00:00 | 4 | | | 14:00:00 | 1 | | | 15:00:00 | 5 | | | 15:00:00 | 2 | | | 15:00:00 | 2 | | +----------+-------+------+ SELECT time, value sum(value) OVER ( ORDER BY time ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) FROM data_points ORDER BY time; +----------+-------+------+ | time | value | sum | +----------+-------+------+ | 10:00:00 | 2 | | | 11:00:00 | 5 | | | 12:00:00 | 4 | | | 13:00:00 | 4 | | | 14:00:00 | 1 | | | 15:00:00 | 5 | | | 15:00:00 | 2 | | | 15:00:00 | 2 | | +----------+-------+------+ So how do frames work?
  19. 19. © 2018 MariaDB Foundation What are window functions? SELECT time, value sum(value) OVER ( ORDER BY time ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM data_points ORDER BY time; +----------+-------+------+ | time | value | sum | +----------+-------+------+ | 10:00:00 | 2 | 7 | (2 + 5) | 11:00:00 | 5 | | | 12:00:00 | 4 | | | 13:00:00 | 4 | | | 14:00:00 | 1 | | | 15:00:00 | 5 | | | 15:00:00 | 2 | | | 15:00:00 | 2 | | +----------+-------+------+ SELECT time, value sum(value) OVER ( ORDER BY time ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) FROM data_points ORDER BY time; +----------+-------+------+ | time | value | sum | +----------+-------+------+ | 10:00:00 | 2 | 11 | (2 + 5 + 4) | 11:00:00 | 5 | | | 12:00:00 | 4 | | | 13:00:00 | 4 | | | 14:00:00 | 1 | | | 15:00:00 | 5 | | | 15:00:00 | 2 | | | 15:00:00 | 2 | | +----------+-------+------+ So how do frames work?
  20. 20. © 2018 MariaDB Foundation What are window functions? SELECT time, value sum(value) OVER ( ORDER BY time ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM data_points ORDER BY time; +----------+-------+------+ | time | value | sum | +----------+-------+------+ | 10:00:00 | 2 | 7 | (2 + 5) | 11:00:00 | 5 | 11 | (2 + 5 + 4) | 12:00:00 | 4 | | | 13:00:00 | 4 | | | 14:00:00 | 1 | | | 15:00:00 | 5 | | | 15:00:00 | 2 | | | 15:00:00 | 2 | | +----------+-------+------+ SELECT time, value sum(value) OVER ( ORDER BY time ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) FROM data_points ORDER BY time; +----------+-------+------+ | time | value | sum | +----------+-------+------+ | 10:00:00 | 2 | 11 | (2 + 5 + 4) | 11:00:00 | 5 | 15 | (2 + 5 + 4 + 4) | 12:00:00 | 4 | | | 13:00:00 | 4 | | | 14:00:00 | 1 | | | 15:00:00 | 5 | | | 15:00:00 | 2 | | | 15:00:00 | 2 | | +----------+-------+------+ So how do frames work?
  21. 21. © 2018 MariaDB Foundation What are window functions? SELECT time, value sum(value) OVER ( ORDER BY time ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM data_points ORDER BY time; +----------+-------+------+ | time | value | sum | +----------+-------+------+ | 10:00:00 | 2 | 7 | (2 + 5) | 11:00:00 | 5 | 11 | (2 + 5 + 4) | 12:00:00 | 4 | 13 | (5 + 4 + 4) | 13:00:00 | 4 | | | 14:00:00 | 1 | | | 15:00:00 | 5 | | | 15:00:00 | 2 | | | 15:00:00 | 2 | | +----------+-------+------+ SELECT time, value sum(value) OVER ( ORDER BY time ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) FROM data_points ORDER BY time; +----------+-------+------+ | time | value | sum | +----------+-------+------+ | 10:00:00 | 2 | 11 | (2 + 5 + 4) | 11:00:00 | 5 | 15 | (2 + 5 + 4 + 4) | 12:00:00 | 4 | 16 | (2 + 5 + 4 + 4 + 1) | 13:00:00 | 4 | | | 14:00:00 | 1 | | | 15:00:00 | 5 | | | 15:00:00 | 2 | | | 15:00:00 | 2 | | +----------+-------+------+ So how do frames work?
  22. 22. © 2018 MariaDB Foundation What are window functions? SELECT time, value sum(value) OVER ( ORDER BY time ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM data_points ORDER BY time; +----------+-------+------+ | time | value | sum | +----------+-------+------+ | 10:00:00 | 2 | 7 | (2 + 5) | 11:00:00 | 5 | 11 | (2 + 5 + 4) | 12:00:00 | 4 | 13 | (5 + 4 + 4) | 13:00:00 | 4 | 9 | (4 + 4 + 1) | 14:00:00 | 1 | | | 15:00:00 | 5 | | | 15:00:00 | 2 | | | 15:00:00 | 2 | | +----------+-------+------+ SELECT time, value sum(value) OVER ( ORDER BY time ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) FROM data_points ORDER BY time; +----------+-------+------+ | time | value | sum | +----------+-------+------+ | 10:00:00 | 2 | 11 | (2 + 5 + 4) | 11:00:00 | 5 | 15 | (2 + 5 + 4 + 4) | 12:00:00 | 4 | 16 | (2 + 5 + 4 + 4 + 1) | 13:00:00 | 4 | 19 | (5 + 4 + 4 + 1 + 5) | 14:00:00 | 1 | | | 15:00:00 | 5 | | | 15:00:00 | 2 | | | 15:00:00 | 2 | | +----------+-------+------+ So how do frames work?
  23. 23. © 2018 MariaDB Foundation What are window functions? SELECT time, value sum(value) OVER ( ORDER BY time ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM data_points ORDER BY time; +----------+-------+------+ | time | value | sum | +----------+-------+------+ | 10:00:00 | 2 | 7 | (2 + 5) | 11:00:00 | 5 | 11 | (2 + 5 + 4) | 12:00:00 | 4 | 13 | (5 + 4 + 4) | 13:00:00 | 4 | 9 | (4 + 4 + 1) | 14:00:00 | 1 | | | 15:00:00 | 5 | | | 15:00:00 | 2 | | | 15:00:00 | 2 | | +----------+-------+------+ SELECT time, value sum(value) OVER ( ORDER BY time ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) FROM data_points ORDER BY time; +----------+-------+------+ | time | value | sum | +----------+-------+------+ | 10:00:00 | 2 | 11 | (2 + 5 + 4) | 11:00:00 | 5 | 15 | (2 + 5 + 4 + 4) | 12:00:00 | 4 | 16 | (2 + 5 + 4 + 4 + 1) | 13:00:00 | 4 | 19 | (5 + 4 + 4 + 1 + 5) | 14:00:00 | 1 | | | 15:00:00 | 5 | | | 15:00:00 | 2 | | | 15:00:00 | 2 | | +----------+-------+------+ So how do frames work? Every new row adds a value and removes a value!
  24. 24. © 2018 MariaDB Foundation What are window functions? SELECT time, value sum(value) OVER ( ORDER BY time ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM data_points ORDER BY time; +----------+-------+------+ | time | value | sum | +----------+-------+------+ | 10:00:00 | 2 | 7 | (2 + 5) | 11:00:00 | 5 | 11 | (2 + 5 + 4) | 12:00:00 | 4 | 13 | (5 + 4 + 4) | 13:00:00 | 4 | 9 | (4 + 4 + 1) | 14:00:00 | 1 | | | 15:00:00 | 5 | | | 15:00:00 | 2 | | | 15:00:00 | 2 | | +----------+-------+------+ SELECT time, value sum(value) OVER ( ORDER BY time ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) FROM data_points ORDER BY time; +----------+-------+------+ | time | value | sum | +----------+-------+------+ | 10:00:00 | 2 | 11 | (2 + 5 + 4) | 11:00:00 | 5 | 15 | (2 + 5 + 4 + 4) | 12:00:00 | 4 | 16 | (2 + 5 + 4 + 4 + 1) | 13:00:00 | 4 | 19 | (5 + 4 + 4 + 1 + 5) | 14:00:00 | 1 | | | 15:00:00 | 5 | | | 15:00:00 | 2 | | | 15:00:00 | 2 | | +----------+-------+------+ So how do frames work? We can do “on-line” computation!
  25. 25. © 2018 MariaDB Foundation What are window functions? SELECT time, value sum(value) OVER ( ORDER BY time ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM data_points ORDER BY time; +----------+-------+------+ | time | value | sum | +----------+-------+------+ | 10:00:00 | 2 | 7 | (2 + 5) | 11:00:00 | 5 | 11 | (2 + 5 + 4) | 12:00:00 | 4 | 13 | (5 + 4 + 4) | 13:00:00 | 4 | 9 | (4 + 4 + 1) | 14:00:00 | 1 | 10 | (4 + 1 + 5) | 15:00:00 | 5 | 8 | (1 + 5 + 2) | 15:00:00 | 2 | 9 | (5 + 2 + 2) | 15:00:00 | 2 | 4 | (2 + 2) +----------+-------+------+ SELECT time, value sum(value) OVER ( ORDER BY time ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) FROM data_points ORDER BY time; +----------+-------+------+ | time | value | sum | +----------+-------+------+ | 10:00:00 | 2 | 11 | (2 + 5 + 4) | 11:00:00 | 5 | 15 | (2 + 5 + 4 + 4) | 12:00:00 | 4 | 16 | (2 + 5 + 4 + 4 + 1) | 13:00:00 | 4 | 19 | (5 + 4 + 4 + 1 + 5) | 14:00:00 | 1 | 16 | (4 + 4 + 1 + 5 + 2) | 15:00:00 | 5 | 14 | (4 + 1 + 5 + 2 + 2) | 15:00:00 | 2 | 10 | (1 + 5 + 2 + 2) | 15:00:00 | 2 | 9 | (5 + 2 + 2) +----------+-------+------+ So how do frames work?
  26. 26. © 2018 MariaDB Foundation Scenario 1 - Regular SQL SELECT timestamp, transaction_id, customer_id, amount, FROM transactions ORDER BY customer_id, timestamp; Given a set of bank transactions, compute the account balance after each transaction. +---------------------+----------------+-------------+--------+ | timestamp | transaction_id | customer_id | amount | +---------------------+----------------+-------------+--------+ | 2016-09-01 10:00:00 | 1 | 1 | 1000 | | 2016-09-01 11:00:00 | 2 | 1 | -200 | | 2016-09-01 12:00:00 | 3 | 1 | -600 | | 2016-09-01 13:00:00 | 5 | 1 | 400 | | 2016-09-01 12:10:00 | 4 | 2 | 300 | | 2016-09-01 14:00:00 | 6 | 2 | 500 | | 2016-09-01 15:00:00 | 7 | 2 | 400 | +---------------------+----------------+-------------+--------+
  27. 27. © 2018 MariaDB Foundation Scenario 1 - Regular SQL SELECT timestamp, transaction_id, customer_id, amount, (SELECT sum(amount) FROM transactions AS t2 WHERE t2.customer_id = t1.customer_id AND t2.timestamp <= t1.timestamp) AS balance FROM transactions AS t1 ORDER BY customer_id, timestamp; Given a set of bank transactions, compute the account balance after each transaction. +---------------------+----------------+-------------+--------+---------+ | timestamp | transaction_id | customer_id | amount | balance | +---------------------+----------------+-------------+--------+---------+ | 2016-09-01 10:00:00 | 1 | 1 | 1000 | 1000 | | 2016-09-01 11:00:00 | 2 | 1 | -200 | 800 | | 2016-09-01 12:00:00 | 3 | 1 | -600 | 200 | | 2016-09-01 13:00:00 | 5 | 1 | 400 | 600 | | 2016-09-01 12:10:00 | 4 | 2 | 300 | 300 | | 2016-09-01 14:00:00 | 6 | 2 | 500 | 800 | | 2016-09-01 15:00:00 | 7 | 2 | 400 | 1200 | +---------------------+----------------+-------------+--------+---------+
  28. 28. © 2018 MariaDB Foundation Scenario 1 - Window Functions SELECT timestamp, transaction_id, customer_id, amount, sum(amount) OVER (PARTITION BY customer_id ORDER BY timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS balance FROM transactions AS t1 ORDER BY customer_id, timestamp; Given a set of bank transactions, compute the account balance after each transaction. +---------------------+----------------+-------------+--------+---------+ | timestamp | transaction_id | customer_id | amount | balance | +---------------------+----------------+-------------+--------+---------+ | 2016-09-01 10:00:00 | 1 | 1 | 1000 | 1000 | | 2016-09-01 11:00:00 | 2 | 1 | -200 | 800 | | 2016-09-01 12:00:00 | 3 | 1 | -600 | 200 | | 2016-09-01 13:00:00 | 5 | 1 | 400 | 600 | | 2016-09-01 12:10:00 | 4 | 2 | 300 | 300 | | 2016-09-01 14:00:00 | 6 | 2 | 500 | 800 | | 2016-09-01 15:00:00 | 7 | 2 | 400 | 1200 | +---------------------+----------------+-------------+--------+---------+
  29. 29. © 2018 MariaDB Foundation Scenario 1 - Performance Given a set of bank transactions, compute the account balance after each transaction. #Rows Regular SQL (seconds) Regular SQL + Index (seconds) Window Functions (seconds) 10 000 0.29 0.01 0.02 100 000 2.91 0.09 0.16 1 000 000 29.1 2.86 3.04 10 000 000 346.3 90.97 43.17 100 000 000 4357.2 813.2 514.24
  30. 30. © 2018 MariaDB Foundation Practical Use Cases - Scenario 2 ■ “Top-N” queries ■ Retrieve the top 5 earners by department.
  31. 31. © 2018 MariaDB Foundation Scenario 2 - Regular SQL SELECT dept, name, salary FROM employee_salaries ORDER BY dept; +-------+----------+--------+ | dept | name | salary | +-------+----------+--------+ | Sales | John | 200 | | Sales | Tom | 300 | | Sales | Bill | 150 | | Sales | Jill | 400 | | Sales | Bob | 500 | | Sales | Axel | 250 | | Sales | Lucy | 300 | | Eng | Tim | 1000 | | Eng | Michael | 2000 | | Eng | Andrew | 1500 | | Eng | Scarlett | 2200 | | Eng | Sergei | 3000 | | Eng | Kristian | 3500 | | Eng | Arnold | 2500 | | Eng | Sami | 2800 | +-------+----------+--------+ Retrieve the top 5 earners by department.
  32. 32. © 2018 MariaDB Foundation Scenario 2 - Regular SQL SELECT dept, name, salary FROM employee_salaries AS t1 WHERE (SELECT count(*) FROM employee_salaries AS t2 WHERE t1.name != t2.name AND t1.dept = t2.dept AND t2.salary > t1.salary) < 5 ORDER BY dept, salary DESC; +-------+----------+--------+ | dept | name | salary | +-------+----------+--------+ | Eng | Kristian | 3500 | | Eng | Sergei | 3000 | | Eng | Sami | 2800 | | Eng | Arnold | 2500 | | Eng | Scarlett | 2200 | | Sales | Bob | 500 | | Sales | Jill | 400 | | Sales | Lucy | 300 | | Sales | Tom | 300 | | Sales | Axel | 250 | +-------+----------+--------+ Retrieve the top 5 earners by department.
  33. 33. © 2018 MariaDB Foundation Scenario 2 - Regular SQL SELECT dept, name, salary FROM employee_salaries AS t1 WHERE (SELECT count(*) FROM employee_salaries AS t2 WHERE t1.name != t2.name AND t1.dept = t2.dept AND t2.salary > t1.salary) < 5 ORDER BY dept, salary DESC; +-------+----------+--------+ | dept | name | salary | +-------+----------+--------+ | Eng | Kristian | 3500 | | Eng | Sergei | 3000 | | Eng | Sami | 2800 | | Eng | Arnold | 2500 | | Eng | Scarlett | 2200 | | Sales | Bob | 500 | | Sales | Jill | 400 | | Sales | Lucy | 300 | | Sales | Tom | 300 | | Sales | Axel | 250 | +-------+----------+--------+ Retrieve the top 5 earners by department. What if I want a “rank” column?
  34. 34. © 2018 MariaDB Foundation Scenario 2 - Regular SQL SELECT (SELECT count(*) + 1 FROM employee_salaries as t2 WHERE t1.name != t2.name and t1.dept = t2.dept and t2.salary > t1.salary) AS ranking, dept, name, salary FROM employee_salaries AS t1 WHERE (SELECT count(*) FROM employee_salaries AS t2 WHERE t1.name != t2.name AND t1.dept = t2.dept AND t2.salary > t1.salary) < 5 ORDER BY dept, salary DESC; +---------+-------+----------+--------+ | ranking | dept | name | salary | +---------+-------+----------+--------+ | 1 | Eng | Kristian | 3500 | | 2 | Eng | Sergei | 3000 | | 3 | Eng | Sami | 2800 | | 4 | Eng | Arnold | 2500 | | 5 | Eng | Scarlett | 2200 | | 1 | Sales | Bob | 500 | | 2 | Sales | Jill | 400 | | 3 | Sales | Lucy | 300 | | 3 | Sales | Tom | 300 | | 5 | Sales | Axel | 250 | +---------+-------+----------+--------+ Retrieve the top 5 earners by department. What if I want a “rank” column?
  35. 35. © 2018 MariaDB Foundation Scenario 2 - Window Functions WITH salary_ranks AS ( SELECT rank() OVER ( PARTITION BY dept ORDER BY salary DESC) AS ranking, dept, name, salary FROM employee_salaries; ) SELECT * FROM salary_ranks WHERE ranking <= 5 ORDER BY dept, ranking; +---------+-------+----------+--------+ | ranking | dept | name | salary | +---------+-------+----------+--------+ | 1 | Eng | Kristian | 3500 | | 2 | Eng | Sergei | 3000 | | 3 | Eng | Sami | 2800 | | 4 | Eng | Arnold | 2500 | | 5 | Eng | Scarlett | 2200 | | 6 | Eng | Michael | 2000 | | 7 | Eng | Andrew | 1500 | | 8 | Eng | Tim | 1000 | | 1 | Sales | Bob | 500 | | 2 | Sales | Jill | 400 | | 3 | Sales | Tom | 300 | | 3 | Sales | Lucy | 300 | | 5 | Sales | Axel | 250 | | 6 | Sales | John | 200 | | 7 | Sales | Bill | 150 | +---------+-------+----------+--------+ Retrieve the top 5 earners by department.
  36. 36. © 2018 MariaDB Foundation Scenario 2 - Window Functions WITH salary_ranks AS ( SELECT rank() OVER ( PARTITION BY dept ORDER BY salary DESC) AS ranking, dept, name, salary FROM employee_salaries WHERE ranking <= 5; ) SELECT * FROM salary_ranks WHERE ranking <= 5 ORDER BY dept, ranking; +---------+-------+----------+--------+ | ranking | dept | name | salary | +---------+-------+----------+--------+ | 1 | Eng | Kristian | 3500 | | 2 | Eng | Sergei | 3000 | | 3 | Eng | Sami | 2800 | | 4 | Eng | Arnold | 2500 | | 5 | Eng | Scarlett | 2200 | | 6 | Eng | Michael | 2000 | | 7 | Eng | Andrew | 1500 | | 8 | Eng | Tim | 1000 | | 1 | Sales | Bob | 500 | | 2 | Sales | Jill | 400 | | 3 | Sales | Tom | 300 | | 3 | Sales | Lucy | 300 | | 5 | Sales | Axel | 250 | | 6 | Sales | John | 200 | | 7 | Sales | Bill | 150 | +---------+-------+----------+--------+ Retrieve the top 5 earners by department.
  37. 37. © 2018 MariaDB Foundation Scenario 2 - Window Functions WITH salary_ranks AS ( SELECT rank() OVER ( PARTITION BY dept ORDER BY salary DESC) AS ranking, dept, name, salary FROM employee_salaries WHERE ranking <= 5; ) SELECT * FROM salary_ranks WHERE ranking <= 5 ORDER BY dept, ranking; +---------+-------+----------+--------+ | ranking | dept | name | salary | +---------+-------+----------+--------+ | 1 | Eng | Kristian | 3500 | | 2 | Eng | Sergei | 3000 | | 3 | Eng | Sami | 2800 | | 4 | Eng | Arnold | 2500 | | 5 | Eng | Scarlett | 2200 | | 6 | Eng | Michael | 2000 | | 7 | Eng | Andrew | 1500 | | 8 | Eng | Tim | 1000 | | 1 | Sales | Bob | 500 | | 2 | Sales | Jill | 400 | | 3 | Sales | Tom | 300 | | 3 | Sales | Lucy | 300 | | 5 | Sales | Axel | 250 | | 6 | Sales | John | 200 | | 7 | Sales | Bill | 150 | +---------+-------+----------+--------+ Retrieve the top 5 earners by department. No Window Functions in the WHERE clause :(
  38. 38. © 2018 MariaDB Foundation Scenario 2 - Window Functions WITH salary_ranks AS ( SELECT rank() OVER ( PARTITION BY dept ORDER BY salary DESC) AS ranking, dept, name, salary FROM employee_salaries ) SELECT * FROM salary_ranks WHERE ranking <= 5 ORDER BY dept, ranking; +---------+-------+----------+--------+ | ranking | dept | name | salary | +---------+-------+----------+--------+ | 1 | Eng | Kristian | 3500 | | 2 | Eng | Sergei | 3000 | | 3 | Eng | Sami | 2800 | | 4 | Eng | Arnold | 2500 | | 5 | Eng | Scarlett | 2200 | | 1 | Sales | Bob | 500 | | 2 | Sales | Jill | 400 | | 3 | Sales | Lucy | 300 | | 3 | Sales | Tom | 300 | | 5 | Sales | Axel | 250 | +---------+-------+----------+--------+ Retrieve the top 5 earners by department.
  39. 39. © 2018 MariaDB Foundation Scenario 2 - Performance Retrieve the top 5 earners by department. #Rows Regular SQL (seconds) Regular SQL + Index (seconds) Window Functions (seconds) 2 000 1.31 0.14 0.00 20 000 123.6 12.6 0.02 200 000 10000+ 1539.79 0.21 2 000 000 ... ... 5.61 20 000 000 ... ... 76.04
  40. 40. © 2018 MariaDB Foundation Practical Use Cases - Scenario 3 ■ We have a number of machines that need servicing. ■ Servicing times are logged. ■ What is the average time between services, for each machine?
  41. 41. © 2018 MariaDB Foundation Scenario 3 - Regular SQL SELECT time, machine_id FROM maintenance_activity; +---------------------+------------+ | time | machine_id | +---------------------+------------+ | 2017-01-04 11:02:31 | 5879 | | 2016-10-31 21:30:19 | 8580 | | 2017-01-16 11:33:58 | 7489 | | 2016-11-01 17:09:07 | 9590 | | 2016-10-03 23:33:21 | 6913 | | 2016-11-02 11:02:08 | 6892 | | 2017-01-07 15:43:52 | 4190 | .... .... .... .... | 2016-12-18 03:27:40 | 8578 | | 2016-12-06 21:57:11 | 3563 | | 2017-01-20 21:16:18 | 4434 | +---------------------+------------+ Compute average time between machine services.
  42. 42. © 2018 MariaDB Foundation Scenario 3 - Regular SQL SELECT time, machine_id FROM maintenance_activity; +---------------------+------------+ | time | machine_id | +---------------------+------------+ | 2017-01-04 11:02:31 | 5879 | | 2016-10-31 21:30:19 | 8580 | | 2017-01-16 11:33:58 | 7489 | | 2016-11-01 17:09:07 | 9590 | | 2016-10-03 23:33:21 | 6913 | | 2016-11-02 11:02:08 | 6892 | | 2017-01-07 15:43:52 | 4190 | .... .... .... .... | 2016-12-18 03:27:40 | 8578 | | 2016-12-06 21:57:11 | 3563 | | 2017-01-20 21:16:18 | 4434 | +---------------------+------------+ Compute average time between machine services. We want the difference between two consecutive entries. (For the same machine)
  43. 43. © 2018 MariaDB Foundation Scenario 3 - Regular SQL WITH time_diffs AS ( SELECT t1.machine_id, TIMEDIFF(t1.time, max(t2.time)) AS diff FROM maintenance_activity AS t1, maintenance_activity AS t2 WHERE t1.machine_id = t2.machine_id and t2.time < t1.time GROUP BY t1.machine_id, t1.time ) SELECT machine_id, AVG(diff) AS avg_diff FROM time_diffs GROUP BY machine_id ORDER BY machine_id; +------------+------------------------+ | machine_id | avg_diff | +------------+------------------------+ | 0 | 25:38:15.0505 | | 1 | 26:42:27.6969 | | 2 | 24:43:18.4646 | | 3 | 23:57:55.9797 | | 4 | 26:30:11.6565 | | 5 | 25:38:12.7070 | | 6 | 27:58:24.9494 | | 7 | 20:47:57.7272 | | 8 | 28:16:02.0303 | | 9 | 25:38:48.0505 | | 10 | 28:34:57.8686 | | 11 | 27:05:40.4040 | | 12 | 24:09:13.5050 | | 13 | 22:04:08.8383 | | 14 | 26:42:41.8888 | | 15 | 19:24:46.8888 | .... Compute average time between machine services.
  44. 44. © 2018 MariaDB Foundation Scenario 3 - Regular SQL WITH time_diffs AS ( SELECT machine_id, time, lag(time) OVER ( PARTITION BY machine_id ORDER BY time) AS prev_time FROM maintenance_activity ) SELECT machine_id, AVG(TIME_DIFF(time, prev_time)) AS avg_diff FROM time_diffs ORDER BY machine_id; +------------+------------------------+ | machine_id | avg_diff | +------------+------------------------+ | 0 | 25:38:15.0505 | | 1 | 26:42:27.6969 | | 2 | 24:43:18.4646 | | 3 | 23:57:55.9797 | | 4 | 26:30:11.6565 | | 5 | 25:38:12.7070 | | 6 | 27:58:24.9494 | | 7 | 20:47:57.7272 | | 8 | 28:16:02.0303 | | 9 | 25:38:48.0505 | | 10 | 28:34:57.8686 | | 11 | 27:05:40.4040 | | 12 | 24:09:13.5050 | | 13 | 22:04:08.8383 | | 14 | 26:42:41.8888 | | 15 | 19:24:46.8888 | .... Compute average time between machine services.
  45. 45. © 2018 MariaDB Foundation Scenario 3 - Performance Compute average time between machine services. #Rows Regular SQL (seconds) Regular SQL + Index (seconds) Window Functions (seconds) 10 000 5.19 0.51 0.02 100 000 526.48 4.67 0.17 1 000 000 50000+ 47.13 2.92 10 000 000 ... 505.10 43.62 100 000 000 ... 5353.15 512.15
  46. 46. © 2018 MariaDB Foundation Window functions summary ■ Can help eliminate expensive subqueries. ■ Can help eliminate self-joins. ■ Make queries more readable. ■ Make (some) queries faster.
  47. 47. © 2018 MariaDB Foundation Window Functions in MariaDB ■ We support: ○ ROW_NUMBER, RANK, DENSE_RANK, PERCENT_RANK, CUME_DIST, NTILE ○ FIRST_VALUE, LAST_VALUE, NTH_VALUE, LEAD, LAG ○ All regular aggregate functions except GROUP_CONCAT
  48. 48. © 2018 MariaDB Foundation Window Functions in MariaDB ■ 10.3 Added support for: ○ Advanced window functions such as: PERCENTILE_CONT, PERCENTILE_DISC ■ We do not (yet) support: ○ Time interval range-type frames ○ DISTINCT clause in window functions ○ GROUP_CONCAT function
  49. 49. User Defined Functions
  50. 50. User Defined Functions • Was introduced in MariaDB 10.1 • Functions are written in C or C++, and to make use of them, the operating system must support dynamic loading
  51. 51. What are user defined functions? Some functions needed to be defined for UDFs • x(): Required for all UDF's, this is where the results are calculated • x_init(): Initialization function for x() • x_deinit(): De-initialization function for x(). Used to deallocate memory that was allocated in x_init()
  52. 52. What are user defined functions? In addition to the functions mentioned earlier we need these extra functions for aggregate UDF • x_clear(): Used to reset the current aggregate, but without inserting the argument as the initial aggregate value for the new group. • x_add():Used to add the argument to the current aggregate
  53. 53. An example: Average cost struct avgcost_data { ulonglong count; longlong totalquantity; double totalprice; }; void avgcost_clear(UDF_INIT* initid, char* is_null) { struct avgcost_data* data = (struct avgcost_data*)initid->ptr; data->totalprice= 0.0; data->totalquantity=0; data->count=0; } my_bool avgcost_init( UDF_INIT* initid, UDF_ARGS* args, char* message ) { initid->maybe_null= 0; initid->decimals = 4; initid->max_length= 20; if (!(data = (struct avgcost_data*) malloc(sizeof(struct avgcost_data)))) { strmov(message,"Couldn't allocate memory"); return 1; } data->totalquantity= 0; data->totalprice= 0.0; initid->ptr = (char*)data; Return 0 }
  54. 54. An example: Average cost void avgcost_add(UDF_INIT* initid, UDF_ARGS* args, char* is_null __attribute__((unused)), char* message __attribute__((unused)){ if (args->args[0] && args->args[1]){ struct avgcost_data* data= (struct avgcost_data*)initid->ptr; longlong quantity = *((longlong*)args->args[0]); longlong newquantity= data->totalquantity + quantity; double price= *((double*)args->args[1]); data->count++; if ( ((data->totalquantity >= 0) && (quantity < 0)) || ((data->totalquantity < 0) && (quantity > 0)) ){ if ( ((quantity < 0) && (newquantity < 0)) || ((quantity > 0) && (newquantity > 0)) { data->totalprice= price *(double)newquantity; } else{ price= data->totalprice/ (double)data->totalquantity; data->totalprice = price * (double)newquantity; } data->totalquantity = newquantity; } else{ data->totalquantity += quantity; data->totalprice += price * (double)quantity; } if (data->totalquantity == 0) data->totalprice = 0.0; } }
  55. 55. An example: Average cost double avgcost( UDF_INIT* initid, UDF_ARGS* args, char* is_null, char* error) { struct avgcost_data* data = (struct avgcost_data*)initid->ptr; if (!data->count || !data->totalquantity) { *is_null = 1; return 0.0; } *is_null = 0; return data->totalprice/(double)data->totalquantity; } void avgcost_deinit( UDF_INIT* initid ) { free(initid->ptr); }
  56. 56. CREATE AGGREGATE FUNCTION avgcost RETURNS REAL SONAME "$UDF_EXAMPLE_SO"; create table t1(sum int, price float(24)); insert into t1 values(100, 50.00), (100, 100.00); select avgcost(sum, price) from t1;
  57. 57. Custom Aggregate Functions
  58. 58. Custom Aggregate Functions ● As part of Google Summer of Code in 2016 ● Available in 10.3 ● Functions are written in SQL
  59. 59. What are custom aggregate functions? ● Aggregate functions ○ Computed over a sequence of rows ○ Return one result for the sequence of rows ● Identified by: ○ AGGREGATE keyword ○ FETCH GROUP NEXT ROW instruction
  60. 60. Syntax for custom aggregate functions CREATE AGGREGATE FUNCTION function_name (parameters) RETURNS return_type BEGIN All types of declarations DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN return_val; LOOP FETCH GROUP NEXT ROW; // fetches next row from table other instructions END LOOP; END|
  61. 61. Let’s start with an example SELECT aggregate_count(Stud_Id) from MARKS; +----------+---------+ |Stud_Id | GradeCnt| +----------+---------+ | 1 | 6 | | 2 | 4 | | 3 | 7 | | 4 | 5 | +----------+---------+ CREATE AGGREGATE FUNCTION aggregate_count(x INT) RETURNS INT BEGIN DECLARE count_students INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN count_students; LOOP FETCH GROUP NEXT ROW; IF x THEN SET count_students = count_students+1; END IF; END LOOP; END|
  62. 62. Let’s start with an example SELECT aggregate_count(Stud_Id) from MARKS; +----------+---------+ |Stud_Id | GradeCnt| +----------+---------+ | 1 | 6 | | 2 | 4 | | 3 | 7 | | 4 | 5 | +----------+---------+ CREATE AGGREGATE FUNCTION aggregate_count(x INT) RETURNS INT BEGIN DECLARE count_students INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN count_students; LOOP FETCH GROUP NEXT ROW; IF x THEN SET count_students = count_students+1; END IF; END LOOP; END|
  63. 63. Let’s start with an example SELECT aggregate_count(Stud_Id) from MARKS; +----------+---------+ |Stud_Id | GradeCnt| +----------+---------+ | 1 | 6 | | 2 | 4 | | 3 | 7 | | 4 | 5 | +----------+---------+ CREATE AGGREGATE FUNCTION aggregate_count(x INT) RETURNS INT BEGIN DECLARE count_students INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN count_students; LOOP FETCH GROUP NEXT ROW; IF x THEN SET count_students = count_students+1; END IF; END LOOP; END| count_students=0 x=1
  64. 64. Let’s start with an example SELECT aggregate_count(Stud_Id) from MARKS; +----------+---------+ |Stud_Id | GradeCnt| +----------+---------+ | 1 | 6 | | 2 | 4 | | 3 | 7 | | 4 | 5 | +----------+---------+ CREATE AGGREGATE FUNCTION aggregate_count(x INT) RETURNS INT BEGIN DECLARE count_students INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN count_students; LOOP FETCH GROUP NEXT ROW; IF x THEN SET count_students = count_students+1; END IF; END LOOP; END| count_students=0 x=1
  65. 65. Let’s start with an example SELECT aggregate_count(Stud_Id) from MARKS; +----------+---------+ |Stud_Id | GradeCnt| +----------+---------+ | 1 | 6 | | 2 | 4 | | 3 | 7 | | 4 | 5 | +----------+---------+ CREATE AGGREGATE FUNCTION aggregate_count(x INT) RETURNS INT BEGIN DECLARE count_students INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN count_students; LOOP FETCH GROUP NEXT ROW; IF x THEN SET count_students = count_students+1; END IF; END LOOP; END| count_students=1 x=1
  66. 66. Let’s start with an example SELECT aggregate_count(Stud_Id) from MARKS; +----------+---------+ |Stud_Id | GradeCnt| +----------+---------+ | 1 | 6 | | 2 | 4 | | 3 | 7 | | 4 | 5 | +----------+---------+ CREATE AGGREGATE FUNCTION aggregate_count(x INT) RETURNS INT BEGIN DECLARE count_students INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN count_students; LOOP FETCH GROUP NEXT ROW; IF x THEN SET count_students = count_students+1; END IF; END LOOP; END| count_students=1 x=2
  67. 67. Let’s start with an example SELECT aggregate_count(Stud_Id) from MARKS; +----------+---------+ |Stud_Id | GradeCnt| +----------+---------+ | 1 | 6 | | 2 | 4 | | 3 | 7 | | 4 | 5 | +----------+---------+ CREATE AGGREGATE FUNCTION aggregate_count(x INT) RETURNS INT BEGIN DECLARE count_students INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN count_students; LOOP FETCH GROUP NEXT ROW; IF x THEN SET count_students = count_students+1; END IF; END LOOP; END| count_students=2 x=2
  68. 68. Let’s start with an example SELECT aggregate_count(Stud_Id) from MARKS; +----------+---------+ |Stud_Id | GradeCnt| +----------+---------+ | 1 | 6 | | 2 | 4 | | 3 | 7 | | 4 | 5 | +----------+---------+ CREATE AGGREGATE FUNCTION aggregate_count(x INT) RETURNS INT BEGIN DECLARE count_students INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN count_students; LOOP FETCH GROUP NEXT ROW; IF x THEN SET count_students = count_students+1; END IF; END LOOP; END| count_students=2 x=3
  69. 69. Let’s start with an example SELECT aggregate_count(Stud_Id) from MARKS; +----------+---------+ |Stud_Id | GradeCnt| +----------+---------+ | 1 | 6 | | 2 | 4 | | 3 | 7 | | 4 | 5 | +----------+---------+ CREATE AGGREGATE FUNCTION aggregate_count(x INT) RETURNS INT BEGIN DECLARE count_students INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN count_students; LOOP FETCH GROUP NEXT ROW; IF x THEN SET count_students = count_students+1; END IF; END LOOP; END| count_students=3 x=3
  70. 70. Let’s start with an example SELECT aggregate_count(Stud_Id) from MARKS; +----------+---------+ |Stud_Id | GradeCnt| +----------+---------+ | 1 | 6 | | 2 | 4 | | 3 | 7 | | 4 | 5 | +----------+---------+ CREATE AGGREGATE FUNCTION aggregate_count(x INT) RETURNS INT BEGIN DECLARE count_students INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN count_students; LOOP FETCH GROUP NEXT ROW; IF x THEN SET count_students = count_students+1; END IF; END LOOP; END| count_students=3 x=3
  71. 71. Let’s start with an example SELECT aggregate_count(Stud_Id) from MARKS; +----------+---------+ |Stud_Id | GradeCnt| +----------+---------+ | 1 | 6 | | 2 | 4 | | 3 | 7 | | 4 | 5 | +----------+---------+ CREATE AGGREGATE FUNCTION aggregate_count(x INT) RETURNS INT BEGIN DECLARE count_students INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN count_students; LOOP FETCH GROUP NEXT ROW; IF x THEN SET count_students = count_students+1; END IF; END LOOP; END| count_students=3 x=4
  72. 72. Let’s start with an example SELECT aggregate_count(Stud_Id) from MARKS; +----------+---------+ |Stud_Id | GradeCnt| +----------+---------+ | 1 | 6 | | 2 | 4 | | 3 | 7 | | 4 | 5 | +----------+---------+ CREATE AGGREGATE FUNCTION aggregate_count(x INT) RETURNS INT BEGIN DECLARE count_students INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN count_students; LOOP FETCH GROUP NEXT ROW; IF x THEN SET count_students = count_students+1; END IF; END LOOP; END| count_students=4 x=4
  73. 73. Let’s start with an example SELECT aggregate_count(Stud_Id) from MARKS; +----------+---------+ |Stud_Id | GradeCnt| +----------+---------+ | 1 | 6 | | 2 | 4 | | 3 | 7 | | 4 | 5 | +----------+---------+ CREATE AGGREGATE FUNCTION aggregate_count(x INT) RETURNS INT BEGIN DECLARE count_students INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN count_students; LOOP FETCH GROUP NEXT ROW; IF x THEN SET count_students = count_students+1; END IF; END LOOP; END| A signal stating NO MORE ROWS is sent count_students=4 x=4
  74. 74. Let’s start with an example SELECT aggregate_count(Stud_Id) from MARKS; +----------+---------+ |Stud_Id | GradeCnt| +----------+---------+ | 1 | 6 | | 2 | 4 | | 3 | 7 | | 4 | 5 | +----------+---------+ CREATE AGGREGATE FUNCTION aggregate_count(x INT) RETURNS INT BEGIN DECLARE count_students INT DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN count_students; LOOP FETCH GROUP NEXT ROW; IF x THEN SET count_students = count_students+1; END IF; END LOOP; END| Handler handles the signal and executes the return statement Return Value 4
  75. 75. Example: Median CREATE AGGREGATE FUNCTION function_name (parameters) RETURNS return_type BEGIN DECLARE CONTINUE HANDLER FOR NOT FOUND BEGIN DECLARE res DOUBLE; DECLARE cnt INT DEFAULT (select count(*) from tt); DECLARE lim INT DEFAULT (cnt-1) DIV 2; IF cnt % 2 == 0 THEN SET res= (select AVG(a) FROM (select a FROM tt ORDER BY a LIMIT lim,2) ttt); ELSE SET res= (select a FROM tt ORDER BY a LIMIT lim,1); END IF; DROP TEMPORARY TABLE tt; RETURN res; END; CREATE TEMPORARY TABLE tt (a INT); LOOP FETCH GROUP NEXT ROW; INSERT INTO tt VALUES(x); END LOOP; END|
  76. 76. Example: Custom Aggregate with GROUP BY clause SELECT Stud_Id, geometric_mean(Grade) from GRADES GROUP BY Stud_Id; +----------+---------+ +--------+--------------+ | Stud_Id |Grade | |Stud_Id |geometric_mean| +----------+---------+ +--------+--------------+ | 1 | 5 | | 1| | | 1 | 4 | | 2| | | 1 | 5 | | 3| | | 2 | 2 | +--------+--------------+ | 2 | 4 | | 2 | 3 | | 3 | 6 | +----------+---------+ CREATE AGGREGATE FUNCTION geometric_mean(x DOUBLE) RETURNS DOUBLE BEGIN DECLARE mean DOUBLE DEFAULT 1; DECLARE num_rows DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN Power(mean, 1/num_rows); LOOP FETCH GROUP NEXT ROW; IF x is NOT NULL THEN SET mean = mean * x; SET num_rows = num_rows+1; END IF END LOOP; END| mean=1 x= 5
  77. 77. Example: Custom Aggregate with GROUP BY clause SELECT Stud_Id, geometric_mean(Grade) from GRADES GROUP BY Stud_Id; +----------+---------+ +--------+--------------+ | Stud_Id |Grade | |Stud_Id |geometric_mean| +----------+---------+ +--------+--------------+ | 1 | 5 | | 1| | | 1 | 4 | | 2| | | 1 | 5 | | 3| | | 2 | 2 | +--------+--------------+ | 2 | 4 | | 2 | 3 | | 3 | 6 | +----------+---------+ CREATE AGGREGATE FUNCTION geometric_mean(x DOUBLE) RETURNS DOUBLE BEGIN DECLARE mean DOUBLE DEFAULT 1; DECLARE num_rows DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN Power(mean, 1/num_rows); LOOP FETCH GROUP NEXT ROW; IF x is NOT NULL THEN SET mean = mean * x; SET num_rows = num_rows+1; END IF END LOOP; END| mean=1 x= 5
  78. 78. Example: Custom Aggregate with GROUP BY clause SELECT Stud_Id, geometric_mean(Grade) from GRADES GROUP BY Stud_Id; +----------+---------+ +--------+--------------+ | Stud_Id |Grade | |Stud_Id |geometric_mean| +----------+---------+ +--------+--------------+ | 1 | 5 | | 1| | | 1 | 4 | | 2| | | 1 | 5 | | 3| | | 2 | 2 | +--------+--------------+ | 2 | 4 | | 2 | 3 | | 3 | 6 | +----------+---------+ CREATE AGGREGATE FUNCTION geometric_mean(x DOUBLE) RETURNS DOUBLE BEGIN DECLARE mean DOUBLE DEFAULT 1; DECLARE num_rows DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN Power(mean, 1/num_rows); LOOP FETCH GROUP NEXT ROW; IF x is NOT NULL THEN SET mean = mean * x; SET num_rows = num_rows+1; END IF END LOOP; END| mean=5 x= 5
  79. 79. Example: Custom Aggregate with GROUP BY clause SELECT Stud_Id, geometric_mean(Grade) from GRADES GROUP BY Stud_Id; +----------+---------+ +--------+--------------+ | Stud_Id |Grade | |Stud_Id |geometric_mean| +----------+---------+ +--------+--------------+ | 1 | 5 | | 1| | | 1 | 4 | | 2| | | 1 | 5 | | 3| | | 2 | 2 | +--------+--------------+ | 2 | 4 | | 2 | 3 | | 3 | 6 | +----------+---------+ CREATE AGGREGATE FUNCTION geometric_mean(x DOUBLE) RETURNS DOUBLE BEGIN DECLARE mean DOUBLE DEFAULT 1; DECLARE num_rows DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN Power(mean, 1/num_rows); LOOP FETCH GROUP NEXT ROW; IF x is NOT NULL THEN SET mean = mean * x; SET num_rows = num_rows+1; END IF END LOOP; END| mean=5 x= 4
  80. 80. Example: Custom Aggregate with GROUP BY clause SELECT Stud_Id, geometric_mean(Grade) from GRADES GROUP BY Stud_Id; +----------+---------+ +--------+--------------+ | Stud_Id |Grade | |Stud_Id |geometric_mean| +----------+---------+ +--------+--------------+ | 1 | 5 | | 1| | | 1 | 4 | | 2| | | 1 | 5 | | 3| | | 2 | 2 | +--------+--------------+ | 2 | 4 | | 2 | 3 | | 3 | 6 | +----------+---------+ CREATE AGGREGATE FUNCTION geometric_mean(x DOUBLE) RETURNS DOUBLE BEGIN DECLARE mean DOUBLE DEFAULT 1; DECLARE num_rows DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN Power(mean, 1/num_rows); LOOP FETCH GROUP NEXT ROW; IF x is NOT NULL THEN SET mean = mean * x; SET num_rows = num_rows+1; END IF END LOOP; END| mean=20 x= 4
  81. 81. Example: Custom Aggregate with GROUP BY clause SELECT Stud_Id, geometric_mean(Grade) from GRADES GROUP BY Stud_Id; +----------+---------+ +--------+--------------+ | Stud_Id |Grade | |Stud_Id |geometric_mean| +----------+---------+ +--------+--------------+ | 1 | 5 | | 1| | | 1 | 4 | | 2| | | 1 | 5 | | 3| | | 2 | 2 | +--------+--------------+ | 2 | 4 | | 2 | 3 | | 3 | 6 | +----------+---------+ CREATE AGGREGATE FUNCTION geometric_mean(x DOUBLE) RETURNS DOUBLE BEGIN DECLARE mean DOUBLE DEFAULT 1; DECLARE num_rows DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN Power(mean, 1/num_rows); LOOP FETCH GROUP NEXT ROW; IF x is NOT NULL THEN SET mean = mean * x; SET num_rows = num_rows+1; END IF END LOOP; END| mean=20 x= 5
  82. 82. Example: Custom Aggregate with GROUP BY clause SELECT Stud_Id, geometric_mean(Grade) from GRADES GROUP BY Stud_Id; +----------+---------+ +--------+--------------+ | Stud_Id |Grade | |Stud_Id |geometric_mean| +----------+---------+ +--------+--------------+ | 1 | 5 | | 1| | | 1 | 4 | | 2| | | 1 | 5 | | 3| | | 2 | 2 | +--------+--------------+ | 2 | 4 | | 2 | 3 | | 3 | 6 | +----------+---------+ CREATE AGGREGATE FUNCTION geometric_mean(x DOUBLE) RETURNS DOUBLE BEGIN DECLARE mean DOUBLE DEFAULT 1; DECLARE num_rows DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN Power(mean, 1/num_rows); LOOP FETCH GROUP NEXT ROW; IF x is NOT NULL THEN SET mean = mean * x; SET num_rows = num_rows+1; END IF END LOOP; END| mean=100 x= 5
  83. 83. Example: Custom Aggregate with GROUP BY clause SELECT Stud_Id, geometric_mean(Grade) from GRADES GROUP BY Stud_Id; +----------+---------+ +--------+--------------+ | Stud_Id |Grade | |Stud_Id |geometric_mean| +----------+---------+ +--------+--------------+ | 1 | 5 | | 1| | | 1 | 4 | | 2| | | 1 | 5 | | 3| | | 2 | 2 | +--------+--------------+ | 2 | 4 | | 2 | 3 | | 3 | 6 | +----------+---------+ CREATE AGGREGATE FUNCTION geometric_mean(x DOUBLE) RETURNS DOUBLE BEGIN DECLARE mean DOUBLE DEFAULT 1; DECLARE num_rows DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN Power(mean, 1/num_rows); LOOP FETCH GROUP NEXT ROW; IF x is NOT NULL THEN SET mean = mean * x; SET num_rows = num_rows+1; END IF END LOOP; END| A signal stating NO MORE ROWS is sent mean=100 x= 5
  84. 84. Example: Custom Aggregate with GROUP BY clause SELECT Stud_Id, geometric_mean(Grade) from GRADES GROUP BY Stud_Id; +----------+---------+ +--------+--------------+ | Stud_Id |Grade | |Stud_Id |geometric_mean| +----------+---------+ +--------+--------------+ | 1 | 5 | | 1|4.641588833 | | 1 | 4 | | 2| | | 1 | 5 | | 3| | | 2 | 2 | +--------+--------------+ | 2 | 4 | | 2 | 3 | | 3 | 6 | +----------+---------+ CREATE AGGREGATE FUNCTION geometric_mean(x DOUBLE) RETURNS DOUBLE BEGIN DECLARE mean DOUBLE DEFAULT 1; DECLARE num_rows DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN Power(mean, 1/num_rows); LOOP FETCH GROUP NEXT ROW; IF x is NOT NULL THEN SET mean = mean * x; SET num_rows = num_rows+1; END IF END LOOP; END| Handler handles the signal and executes the return statement mean=100 x= 5
  85. 85. Example: Custom Aggregate with GROUP BY clause SELECT Stud_Id, geometric_mean(Grade) from GRADES GROUP BY Stud_Id; +----------+---------+ +--------+--------------+ | Stud_Id |Grade | |Stud_Id |geometric_mean| +----------+---------+ +--------+--------------+ | 1 | 5 | | 1|4.641588833 | | 1 | 4 | | 2|2.884499140 | | 1 | 5 | | 3|6 | | 2 | 2 | +--------+--------------+ | 2 | 4 | | 2 | 3 | | 3 | 6 | +----------+---------+ CREATE AGGREGATE FUNCTION geometric_mean(x DOUBLE) RETURNS DOUBLE BEGIN DECLARE mean DOUBLE DEFAULT 1; DECLARE num_rows DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN Power(mean, 1/num_rows); LOOP FETCH GROUP NEXT ROW; IF x is NOT NULL THEN SET mean = mean * x; SET num_rows = num_rows+1; END IF END LOOP; END| After repeating the same for other groups the final result is
  86. 86. Summary ● Custom Aggregate Functions allows us to write aggregate functions in SQL while the UDF functions make us to do it in C or C++. ● FETCH GROUP NEXT ROW instruction fetches the next row and then the values are set to the parameters of the function. ● DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN -> this tells the function what to do when the computation is done (NO MORE ROWS). ● The GROUP BY, ORDER BY and HAVING clauses work with aggregate functions. ● In Future we would like to make Custom Aggregate Functions to work as Window Functions also.
  87. 87. Thank You! Any Questions..

×