Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Percona live amsterdam 2015 optimizing mysql stored routines

606 views

Published on

My 2015 presentation on optimizing stored routines, updated and extended from the original 2010 version for Percona Live Amsterdam 2015 https://www.percona.com/live/europe-amsterdam-2015/sessions/optimizing-mysql-stored-routines:

"
MySQL Stored routines were introduced in MySQL 5.0 - a decade ago. Although not widely adopted, there are certainly cases where they are appropriate and useful. So, they really should be part of every MySQL DBA or back-end developer's repetoire. When writing MySQL stored routines, one should be particularly mindful about performance - even more so than when writing plain SQL queries. MySQL's stored routine engine is much less sophisticated than the SQL query planner and hasn't evolved much since it's inception, and sometimes, a seemingly small difference in how you write your code can really make a huge difference in run-time performance.

During this presentation, I will share a number of my findings and show you benchmarks to help you understand opportunities to optimize your stored routines.
"

Published in: Software
  • Be the first to comment

  • Be the first to like this

Percona live amsterdam 2015 optimizing mysql stored routines

  1. 1. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 1 Percona Live Amsterdam 2015 Optimizing Stored Routines
  2. 2. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 2 Welcome, thanks for attending! ● Roland Bouman (Netherlands) ● Ex MySQL AB, Sun Microsystems ● Software Engineer for Pentaho ● Blog: http://rpbouman.blogspot.com/ ● Twitter: @rolandbouman
  3. 3. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 3 Agenda ● Stored routine issues ● Variables and assignments ● Flow of control ● Cursor handling ● Summary
  4. 4. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 4 Agenda ● Stored routine issues? ● Variables and assignments ● Flow of control ● Cursor handling ● Summary
  5. 5. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 5 Stored Routines ● Stored functions (SQL functions) ● Stored procedures ● Triggers ● Events
  6. 6. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 6 Benmarking method ● BENCHMARK(1000000, expression) – Appropriate for computation speed – Some “large enough” number, like 1 or 10 million times – Repeat 6 times – Remove extremes – Average the duration – Correction for an appropriate baseline ● mysql-5.7.8-rc-linux-glibc2.5-x86_64 – Ubuntu 14
  7. 7. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 7 Invocation overhead ● Plain expression (10 mln) ● Equivalent function (10 mln) mysql> SELECT BENCHMARK(10000000, 1); +------------------------+ | benchmark(10000000, 1) | +------------------------+ | 0 | +------------------------+ 1 row in set (0.04 sec) mysql> CREATE FUNCTION f_one() -> RETURNS INT RETURN 1; mysql> SELECT BENCHMARK(10000000, f_one()); +----------------------------+ | benchmark(10000000, f_one) | +----------------------------+ | 0 | +----------------------------+ 1 row in set (11.57 sec) ● SQL expression is 292x faster
  8. 8. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 8 Invocation overhead 1 f_one() 0 2 4 6 8 10 12 14 SQL constant expression vs Stored Function (N = 10 mln) (Pure SQL is just less than 300 times as fast) Duration(seconds)
  9. 9. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 9 Computation inefficiency ● Plain addition (10 mln) ● Equivalent function (10 mln) mysql> SELECT BENCHMARK(10000000, 1+1); +--------------------------+ | benchmark(10000000, 1+1) | +--------------------------+ | 0 | +--------------------------+ 1 row in set (0.13 sec) mysql> CREATE FUNCTION f_one_plus_one() -> RETURNS INT RETURN 1+1; mysql> SELECT BENCHMARK(10000000, f_one_plus_one()); +---------------------------------------+ | benchmark(10000000, f_one_plus_one()) | +---------------------------------------+ | 0 | +---------------------------------------+ 1 row in set (13.83 sec) ● Q: How much slower is addition in a function?
  10. 10. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 10 Applying Correction SQL constant SQL addition SQL addition (Corrected) Function constant Function addition Function addition (Corrected) 0 2000000 4000000 6000000 8000000 10000000 12000000 14000000 16000000 Assignment methods from a SELECT query (N = 10 k) No substantial difference Duration(seconds)
  11. 11. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 11 Correcting for overhead 1 + 1 f_one_plus_one() 0 0.5 1 1.5 2 2.5 SQL addition vs Stored Function (N = 10 mln) (After correction for invocation overhead, pure SQL is just over 25 times as fast) Duration(seconds)
  12. 12. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 12 Agenda ● Stored routine issues ● Variables and assignments ● Flow of control ● Cursor handling ● Summary
  13. 13. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 13 Variable Assignment ● Variable type – Local – User-defined (“session variables”, @) ● Assignment method – DEFAULT clause of DECLARE – statement – SET – statement – SELECT...INTO
  14. 14. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 14 Variable type and Assignment DECLARE local DEFAULT value SET local = value SET @user_defined = value SELECT value INTO local SELECT value INTO @user_defined 0 5 10 15 20 25 30 35 Variable types and Assignment methods (N = 10 mln) (Corrected for baseline) Duration(seconds)
  15. 15. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 15 Variable Assignment ● DEFAULT is about 1.5x as fast as SET ● SET is about as fast for local vs user-defined ● SELECT...INTO – About 2x slower than SET for local variable – About 2.3x slower than SET for user variable
  16. 16. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 16 More about SELECT INTO ● Ok to assign values from a real query ● Avoid when assigning literals SELECT val INTO var SET var = (subquery) RETURN (subquery) 0 2 4 6 8 10 12 14 16 18 Assignment methods from a SELECT query (N = 10 k) No substantial difference Duration(seconds)
  17. 17. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 17 An actual example ● Calculating easter CREATE FUNCTION f_easter_floor( p_year INT ) RETURNS DATE BEGIN DECLARE a SMALLINT DEFAULT p_year % 19; DECLARE b SMALLINT DEFAULT FLOOR(p_year / 100); DECLARE c SMALLINT DEFAULT p_year % 100; DECLARE d SMALLINT DEFAULT FLOOR(b / 4); DECLARE e SMALLINT DEFAULT b % 4; DECLARE f SMALLINT DEFAULT FLOOR((b + 8) / 25); DECLARE g SMALLINT DEFAULT FLOOR((b - f + 1) / 3); DECLARE h SMALLINT DEFAULT (19*a + b - d - g + 15) % 30; DECLARE i SMALLINT DEFAULT FLOOR(c / 4); DECLARE k SMALLINT DEFAULT c % 4; DECLARE L SMALLINT DEFAULT (32 + 2*e + 2*i - h - k) % 7; DECLARE m SMALLINT DEFAULT FLOOR((a + 11*h + 22*L) / 451); DECLARE v100 SMALLINT DEFAULT h + L - 7*m + 114; RETURN STR_TO_DATE( CONCAT(p_year, '-', FLOOR(v100 / 31), '-', (v100 % 31) + 1) , '%Y-%c-%e' ); END;
  18. 18. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 18 Matching expression and variable data types ● Multiple expression of this form: DECLARE b SMALLINT DEFAULT FLOOR(p_year / 100); ● Divide and round to next lowest integer – Alternative: using integer division (DIV) DECLARE b SMALLINT DEFAULT p_year DIV 100; ● 13x performance increase! – ...but: beware for negative values
  19. 19. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 19 Improved easter function: ● 15% faster than using FLOOR() and / CREATE FUNCTION f_easter_div( p_year INT ) RETURNS DATE BEGIN DECLARE a SMALLINT DEFAULT p_year % 19; DECLARE b SMALLINT DEFAULT p_year DIV 100; DECLARE c SMALLINT DEFAULT p_year % 100; DECLARE d SMALLINT DEFAULT b DIV 4; DECLARE e SMALLINT DEFAULT b % 4; DECLARE f SMALLINT DEFAULT (b + 8) DIV 25; DECLARE g SMALLINT DEFAULT (b - f + 1) DIV 3; DECLARE h SMALLINT DEFAULT (19*a + b - d - g + 15) % 30; DECLARE i SMALLINT DEFAULT c DIV 4; DECLARE k SMALLINT DEFAULT c % 4; DECLARE L SMALLINT DEFAULT (32 + 2*e + 2*i - h - k) % 7; DECLARE m SMALLINT DEFAULT (a + 11*h + 22*L) DIV 451; DECLARE v100 SMALLINT DEFAULT h + L - 7*m + 114; RETURN STR_TO_DATE( CONCAT(p_year, '-', v100 DIV 31, '-', (v100 % 31) + 1) , '%Y-%c-%e' ); END;
  20. 20. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 20 Variable and assignment Summary ● Avoid user-defined variables – Use local variables instead ● Don't let DEFAULT go to waste – If you don't, time is wasted ● Beware of SELECT..INTO – Good for assigning values from queries – Use SET instead for any other case ● Match operator and variable data type
  21. 21. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 21 Agenda ● Stored routine Issues? ● Variables and assignments ● Flow of control ● Cursor handling ● Summary
  22. 22. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 22 Flow of Control Part 1: Conditionals ● Decisions, alternate code paths ● Plain SQL operators and functions: – IF(), CASE...END – IFNULL(), NULLIF(), COALESCE() – ELT(), FIELD(), FIND_IN_SET() ● Stored routine statements: – IF...END IF – CASE...END CASE
  23. 23. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 23 Case operator vs Case statement CREATE FUNCTION f_case_operator( p_arg INT ) RETURNS INT BEGIN DECLARE a CHAR(1); SET a = CASE p_arg WHEN 1 THEN 'a' WHEN 2 THEN 'b' WHEN 3 THEN 'c' WHEN 4 THEN 'd' WHEN 5 THEN 'e' WHEN 6 THEN 'f' WHEN 7 THEN 'g' WHEN 8 THEN 'h' WHEN 9 THEN 'i' ELSE NULL END; RETURN NULL; END; CREATE FUNCTION f_case_statement( p_arg INT ) RETURNS INT BEGIN DECLARE a CHAR(1); CASE p_arg WHEN 1 THEN SET a = 'a'; WHEN 2 THEN SET a = 'b'; WHEN 3 THEN SET a = 'c'; WHEN 4 THEN SET a = 'd'; WHEN 5 THEN SET a = 'e'; WHEN 6 THEN SET a = 'f'; WHEN 7 THEN SET a = 'g'; WHEN 8 THEN SET a = 'h'; WHEN 9 THEN SET a = 'i'; ELSE SET a = NULL; END CASE; RETURN NULL; END;
  24. 24. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 24 IF function vs IF statement CREATE FUNCTION f_if_function( p_arg INT ) RETURNS INT BEGIN DECLARE a CHAR(1); IF (p_arg = 1, 'a', IF (p_arg = 2, 'b', ...more branches... ) ); RETURN NULL; END; CREATE FUNCTION f_if_statement( p_arg INT ) RETURNS INT BEGIN DECLARE a CHAR(1); IF p_arg = 1 THEN SET a = 'a'; ELSIF p_arg = 2 THEN SET a = 'b'; ...more branches... ELSE SET a = NULL; END IF; RETURN NULL; END;
  25. 25. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 25 Conditional Flow constructs 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 Conditional Flow Constructs Time (N = 1mln.) (N = 1mln) CASE operator CASE statement IF function IF statement Number of Cases or Branches Time(seconds)
  26. 26. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 26 Conditional Flow Constructs ● Constant performance for Function/Operator ● Linear relationship between number of cases and time for statements ● CASE operator is fastest ● CASE statement vs IF statement – IF slightly faster for few cases/branches – Break even at 3 branches / cases ● IF() function vs IF..THEN statement – Statement slightly faster for few branches – Break even at 4/5 branches ● Consider using CASE instead of IF
  27. 27. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 27 Flow of Control Part 2: Recursion vs Iteration ● Recursion is poorly supported – Not supported in triggers and functions ● Additional configuration – max_sp_recursion_depth (max: 255) – thread_stack ● Recursion slower in general than iteration – Invocation overhead
  28. 28. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 28 Fibonacci CREATE PROCEDURE p_fibonacci_recursive(INOUT p INT) BEGIN DECLARE p1 INT DEFAULT p – 1; DECLARE p2 INT DEFAULT p – 2; IF p > 1 THEN CALL p_fibonacci_recursive(p1); CALL p_fibonacci_recursive(p2); SET p = p1 + p2; END IF; END; CREATE PROCEDURE p_fibonacci_iterative(INOUT p INT) proc: BEGIN DECLARE v_index INT DEFAULT 0; DECLARE p1 INT DEFAULT 0; DECLARE p2 INT DEFAULT 1; DECLARE p3 INT DEFAULT 1; DECLARE m INT DEFAULT p - 1; IF p = 0 THEN LEAVE proc; END IF; WHILE v_index < m DO SET p3 = p1 + p2 , p1 = p2 , p2 = p3 , v_index = v_index + 1 ; END WHILE; SET p = p3; END;
  29. 29. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 29 Recursion vs Iteration 20 21 22 23 24 25 26 27 28 29 30 0 5 10 15 20 25 Fibonacci (N = 1) recursive iterative Number of Cases or Branches Time(seconds)
  30. 30. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 30 Flow of control summary ● Use conditional expressions if possible ● Avoid Recursion
  31. 31. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 31 Agenda ● Stored routine Issues? ● Variables and assignments ● Flow of control ● Cursor handling ● Summary
  32. 32. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 32 Cursor Handling ● Why do you need that cursor anyway? ● Only very few cases justify cursors – Data driven stored procedure calls – Data driven dynamic SQL
  33. 33. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 33 You need a cursor to do what?! CREATE FUNCTION f_film_categories(p_film_id INT) RETURNS VARCHAR(2048) BEGIN DECLARE v_done BOOL DEFAULT FALSE; DECLARE v_category VARCHAR(25); DECLARE v_categories VARCHAR(2048); DECLARE film_categories CURSOR FOR SELECT c.name FROM sakila.film_category fc INNER JOIN sakila.category c ON fc.category_id = c.category_id WHERE fc.film_id = p_film_id; DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_done := TRUE; OPEN film_categories; categories_loop: LOOP FETCH film_categories INTO v_category; IF v_done THEN CLOSE film_categories; LEAVE categories_loop; END IF; SET v_categories := CONCAT_WS( ',', v_categories, v_category ); END LOOP; RETURN v_categories; END; SELECT fc.film_id , GROUP_CONCAT(c.name) FROM film_category fc LEFT JOIN category c ON fc.category_id = c.category_id GROUP BY fc.film_id
  34. 34. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 34 Cursor vs LEFT Join ● For N = 1mln, Results are: – LEFT JOIN / GROUP_CONCAT: 0.020 sec – Cursor: 66.21 ● That's > 3200 times as slow (not corrected for overhead)
  35. 35. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 35 Cursor Looping REPEAT, WHILE, LOOP ● Loop control ● What's inside the loop? – Treat nested cursor loops as suspicious – Be very weary of SQL statements inside the loop.
  36. 36. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 36 Why to avoid cursor loops with REPEAT ● Always runs at least once – So what if the set is empty? ● Iteration before checking the loop condition – Always requires an additional explicit check inside the loop ● Loop control scattered: – Both in top and bottom of the loop
  37. 37. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 37 Why to avoid cursor loops with REPEAT BEGIN DECLARE v_done BOOL DEFAULT FALSE; DECLARE csr FOR SELECT * FROM tab; DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_done := TRUE; OPEN csr; REPEAT FETCH csr INTO var1,...,varN; IF NOT v_done THEN -- ... do stuff... END IF; UNTIL v_done END REPEAT; CLOSE csr; END; Loop is entered, without checking if the resultset is empty 1 positve and one negative check to see if he resultset is exhausted;
  38. 38. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 38 Why to avoid cursor loops with WHILE ● Slightly better than REPEAT – Only one check at the top of the loop ● Requires code duplication – One FETCH needed outside the loop ● Loop control still scattered – condition is checked at the top of the loop – FETCH required at the bottom
  39. 39. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 39 Why to avoid cursor loops with WHILE BEGIN DECLARE v_has_rows BOOL DEFAULT TRUE; DECLARE csr FOR SELECT * FROM tab; DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_has_rows := FALSE; OPEN csr; FETCH csr INTO var1,...,varN; WHILE v_has_rows DO -- ... do stuff... FETCH csr INTO var1,...,varN; END WHILE; CLOSE csr; END; Fetch required both outside (just once) and inside the loop
  40. 40. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 40 Why to write cursor loops with LOOP ● No double checking (like in REPEAT) ● No code duplication (like in WHILE) ● All loop control code in one place – All at top of loop
  41. 41. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 41 Why you should write cursor loops with LOOP BEGIN DECLARE v_done BOOL DEFAULT FALSE; DECLARE csr FOR SELECT * FROM tab; DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_done := TRUE; OPEN csr; my_loop: LOOP FETCH csr INTO var1,...,varN; IF v_done THEN CLOSE csr; LEAVE my_loop; END IF; -- ... do stuff... END LOOP; END;
  42. 42. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 42 Cursor summary ● Avoid cursors if you can – Use GROUP_CONCAT for lists – Use joins, not nested cursors – Only for data driven dynamic SQL and stored procedure calls ● Use LOOP instead of REPEAT and WHILE – REPEAT requires double condition checking – WHILE requires code duplication – LOOP keeps all loop control together
  43. 43. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 43 Agenda ● Stored routine Issues? ● Variables and assignments ● Flow of control ● Cursor handling ● Summary
  44. 44. Blog: http://rpbouman.blogspot.com/ twitter: @rolandbouman 44 Summary● Variables – Use local rather than user-defined variables ● Assignments – Use DEFAULT and SET for simple values – Use SELECT INTO for queries ● Flow of Control – Use functions and operators rather than statements ● Cursors – Avoid if possible – Use LOOP, not REPEAT and WHILE

×