MySQL
User Conference and Expo 2010



  Optimizing
Stored Routines

     Blog: http://rpbouman.blogspot.com/   1
     twi...
Welcome, thanks for attending!


●   Roland Bouman; Leiden, Netherlands
●   Ex MySQL AB, Sun Microsystems
●   Web and BI D...
Program
●   Stored routine issues
●   Variables and assignments
●   Flow of control
●   Cursor handling
●   Summary




  ...
Program
●   Stored routine issues?
●   Variables and assignments
●   Flow of control
●   Cursor handling
●   Summary




 ...
Stored Routines: Definition
●   Stored routines:
        –   stored functions (SQL functions)
        –   stored procedure...
Performance Issues
●   SQL inside stored routines is still SQL,
    ...but...
        –   invocation overhead
       –    ...
Invocation overhead
●   Plain expression (10 mln)
    mysql> SELECT BENCHMARK(10000000, 1);
    +------------------------+...
Computation inefficiency
●   Plain addition
    mysql> SELECT BENCHMARK(10000000, 1+1);
    +--------------------------+
 ...
Computation inefficiency
●   Raw measurements
                                 plain expression         function   ratio
 ...
Program
●   Stored routine issues
●   Variables and assignments
●   Flow of control
●   Cursor handling
●   Summary




  ...
Types of Variables
●   User-defined variables
         –   session scope
         –   runtime type
    SET @user_defined_v...
User-defined variable Benchmark
 ●   Baseline
     CREATE FUNCTION f_variable_baseline()
     RETURNS INT
     BEGIN
     ...
User-defined variables
●   User-defined variables about 5x slower
              9

              8

              7

     ...
Assignments
●   SET statement
    SET v_variable := 'some value';

●   SELECT statement
    SELECT 'some value' INTO v_var...
Assignment Benchmarks
    ●   SELECT INTO about 60% slower than SET
    ●   SET about 40% slower than DEFAULT
            ...
More about SELECT INTO

●   Assigning from a SELECT...INTO statement:
         –   ok if you're assigning from a real quer...
Sample function:
           Sakila rental count
CREATE FUNCTION f_assign_select_into(p_customer_id INT) RETURNS INT
BEGIN
...
Sakila Rental count benchmark
●   SET about 25% slower than SELECT INTO

                 10

                  9

       ...
More on variables and
                assignments
●   Match expression and variable data types
           –   example: cal...
Matching expression and variable
           data types
 ●   Multiple expression of this form:
     DECLARE b      SMALLINT...
Improved easter function:
    CREATE FUNCTION f_easter_int_nodiv(
        p_year INT
    ) RETURNS DATE
    BEGIN
        ...
Variable and assignment
             Summary
●   Don't use user-defined variables
       –   Use local variables instead
●...
Program
●   Stored routine Issues?
●   Variables and assignments
●   Flow of control
●   Cursor handling
●   Summary




 ...
Flow of Control
●   Decisions, alternate code paths
●   Plain SQL operators and functions:
       –   IF(), CASE...END
   ...
Case operator vs Case statement
CREATE FUNCTION                       CREATE FUNCTION
f_case_operator(                    ...
Case operator vs Case statement
     ●     linear slowdown of the CASE statement
                          30



         ...
Flow of control summary

●   Use conditional expressions if possible




             Blog: http://rpbouman.blogspot.com/ ...
Program
●   Stored routine Issues?
●   Variables and assignments
●   Flow of control
●   Cursor handling
●   Summary




 ...
Cursor Handling
●   Why do you need that cursor anyway?
●   Only very few cases justify cursors
        –   Data driven st...
You need a cursor to do what?!
CREATE FUNCTION f_film_categories(p_film_id INT)              SELECT    fc.film_id
RETURNS ...
Cursor Looping

    REPEAT, WHILE, LOOP
●   Loop control
●   What's inside the loop?
       –   Treat nested cursor loops ...
Why to avoid cursor loops with
          REPEAT
●   Always runs at least once
        –   So what if the set is empty?
●  ...
Why to avoid cursor loops with
          REPEAT
BEGIN
    DECLARE   v_done BOOL DEFAULT FALSE;
    DECLARE   csr FOR SELEC...
Why to avoid cursor loops with
           WHILE
●   Slightly better than REPEAT
        –   Only one check at the top of t...
Why to avoid cursor loops with
           WHILE
BEGIN
    DECLARE   v_has_rows BOOL DEFAULT TRUE;
    DECLARE   csr FOR SE...
Why to write cursor loops with
            LOOP
●   No double checking (like in REPEAT)
●   No code duplication (like in W...
Why you should write cursor
       loops with LOOP
BEGIN
    DECLARE   v_done BOOL DEFAULT FALSE;
    DECLARE   csr FOR SE...
Cursor summary
●   Avoid cursors if you can
        –   Use GROUP_CONCAT for lists
        –   Use joins, not nested curso...
Program
●   Stored routine Issues?
●   Variables and assignments
●   Flow of control
●   Cursor handling
●   Summary




 ...
Summary
●   Variables
       –   Use local rather than user-defined variables
●   Assignments
       –   Use DEFAULT and S...
Upcoming SlideShare
Loading in …5
×

Optimizing mysql stored routines uc2010

1,572 views

Published on

M

1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total views
1,572
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
43
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Optimizing mysql stored routines uc2010

  1. 1. MySQL User Conference and Expo 2010 Optimizing Stored Routines Blog: http://rpbouman.blogspot.com/ 1 twitter: @rolandbouman
  2. 2. Welcome, thanks for attending! ● Roland Bouman; Leiden, Netherlands ● Ex MySQL AB, Sun Microsystems ● Web and BI Developer ● Co-author of “Pentaho Solutions” ● Blog: http://rpbouman.blogspot.com/ ● Twitter: @rolandbouman Blog: http://rpbouman.blogspot.com/ 2 twitter: @rolandbouman
  3. 3. Program ● Stored routine issues ● Variables and assignments ● Flow of control ● Cursor handling ● Summary Blog: http://rpbouman.blogspot.com/ 3 twitter: @rolandbouman
  4. 4. Program ● Stored routine issues? ● Variables and assignments ● Flow of control ● Cursor handling ● Summary Blog: http://rpbouman.blogspot.com/ 4 twitter: @rolandbouman
  5. 5. Stored Routines: Definition ● Stored routines: – stored functions (SQL functions) – stored procedures – triggers – events Blog: http://rpbouman.blogspot.com/ 5 twitter: @rolandbouman
  6. 6. Performance Issues ● SQL inside stored routines is still SQL, ...but... – invocation overhead – suboptimal computational performance ● Benchmarking method – BENCHMARK(1000000, expression) – Appropriate for computation speed – 1 million times ● MySQL 5.1.36, Windows Blog: http://rpbouman.blogspot.com/ 6 twitter: @rolandbouman
  7. 7. Invocation overhead ● Plain expression (10 mln) mysql> SELECT BENCHMARK(10000000, 1); +------------------------+ | benchmark(10000000, 1) | +------------------------+ | 0 | +------------------------+ 1 row in set (0.19 sec) ● Equivalent function (10 mln) mysql> CREATE FUNCTION f_one() RETURNS INT RETURN 1; mysql> SELECT BENCHMARK(10000000, f_one()); +----------------------------+ | benchmark(10000000, f_one) | +----------------------------+ | 0 | +----------------------------+ 1 row in set (24.59 sec) ● Slowdown 130 times Blog: http://rpbouman.blogspot.com/ 7 twitter: @rolandbouman
  8. 8. Computation inefficiency ● Plain addition mysql> SELECT BENCHMARK(10000000, 1+1); +--------------------------+ | benchmark(10000000, 1+1) | +--------------------------+ | 0 | +--------------------------+ 1 row in set (0.30 sec) ● Equivalent function mysql> CREATE FUNCTION f_one_plus_one() RETURNS INT RETURN 1+1; mysql> SELECT BENCHMARK(10000000, f_one_plus_one()); +---------------------------------------+ | benchmark(10000000, f_one_plus_one()) | +---------------------------------------+ | 0 | +---------------------------------------+ 1 row in set (28.73 sec) Blog: http://rpbouman.blogspot.com/ 8 twitter: @rolandbouman
  9. 9. Computation inefficiency ● Raw measurements plain expression function ratio 1 f_one() 0.19 24.59 0.0077 1+1 f_one_plus_one() 0.29 28.73 0.0101 ● Correction for invocation overhead plain expression function ratio 1 f_one() 0.00 00.00 1+1 f_one_plus_one() 0.10 4.14 0.0242 ● Slowdown about 40 times – after correction for invocation overhead Blog: http://rpbouman.blogspot.com/ 9 twitter: @rolandbouman
  10. 10. Program ● Stored routine issues ● Variables and assignments ● Flow of control ● Cursor handling ● Summary Blog: http://rpbouman.blogspot.com/ 10 twitter: @rolandbouman
  11. 11. Types of Variables ● User-defined variables – session scope – runtime type SET @user_defined_variable := 'some value'; ● Local variables – block scope – declared type BEGIN DECLARE v_local_variable VARCHAR(50); SET v_local_variable := 'some value'; ... END; Blog: http://rpbouman.blogspot.com/ 11 twitter: @rolandbouman
  12. 12. User-defined variable Benchmark ● Baseline CREATE FUNCTION f_variable_baseline() RETURNS INT BEGIN DECLARE a INT DEFAULt 1; RETURN a; END; ● Local variable CREATE FUNCTION f_variable_baseline() RETURNS INT BEGIN DECLARE a INT DEFAULT 1; SET a := 1; RETURN a; END; ● User-defined variable CREATE FUNCTION f_variable_baseline() RETURNS INT BEGIN DECLARE a INT DEFAULT 1; SET @a := 1; RETURN a; END; Blog: http://rpbouman.blogspot.com/ 12 twitter: @rolandbouman
  13. 13. User-defined variables ● User-defined variables about 5x slower 9 8 7 6 5 4 Row 45 3 2 1 0 f_variable_baseline f_local_variable f _user_defined_variable baseline local variable User-defined variable 4.6 5.32 7.89 0.0 0.72 3.29 0.72/3.29 = 0,22 Blog: http://rpbouman.blogspot.com/ 13 twitter: @rolandbouman
  14. 14. Assignments ● SET statement SET v_variable := 'some value'; ● SELECT statement SELECT 'some value' INTO v_variable; ● DEFAULT clause BEGIN DECLARE v_local_variable VARCHAR(50) DEFAULT 'some value'; ... END; Blog: http://rpbouman.blogspot.com/ 14 twitter: @rolandbouman
  15. 15. Assignment Benchmarks ● SELECT INTO about 60% slower than SET ● SET about 40% slower than DEFAULT 30 baseline DEFAULT SET SELECT 25 8.2 15.06 18.25 32.08 20 0 6.86 10.05 23.88 15 100% 42.09% Row 29 10 100% 68.26% 5 0 default clause set statement select into statement Blog: http://rpbouman.blogspot.com/ 15 twitter: @rolandbouman
  16. 16. More about SELECT INTO ● Assigning from a SELECT...INTO statement: – ok if you're assigning from a real query – not so much if you're assigning literals SELECT COUNT(*) SELECT 1 , user_id , 'some value' INTO v_count INTO v_number , v_user_id , v_string FROM t_users Blog: http://rpbouman.blogspot.com/ 16 twitter: @rolandbouman
  17. 17. Sample function: Sakila rental count CREATE FUNCTION f_assign_select_into(p_customer_id INT) RETURNS INT BEGIN DECLARE c INT; SELECT SQL_NO_CACHE, COUNT(*) INTO c FROM sakila.rental WHERE customer_id = p_customer_id; RETURN c; END; CREATE FUNCTION f_assign_select_set(p_customer_id INT) RETURNS INT BEGIN DECLARE c INT; SET c := ( SELECT SQL_NO_CACHE, COUNT(*) FROM sakila.rental WHERE customer_id = p_customer_id); RETURN c; END; CREATE FUNCTION f_noassign_select(p_customer_id INT) RETURNS INT BEGIN RETURN ( SELECT SQL_NO_CACHE, COUNT(*) FROM sakila.rental WHERE customer_id = p_customer_id); END; Blog: http://rpbouman.blogspot.com/ 17 twitter: @rolandbouman
  18. 18. Sakila Rental count benchmark ● SET about 25% slower than SELECT INTO 10 9 8 7 6 5 Row 2 4 3 2 1 0 select into set subquery return subquery N select into set subquery return subquery 100000 7.00 9.06 8.75 Blog: http://rpbouman.blogspot.com/ 18 twitter: @rolandbouman
  19. 19. More on variables and assignments ● Match expression and variable data types – example: calculating easter CREATE FUNCTION f_easter_int_nodiv( p_year INT ) RETURNS DATE BEGIN DECLARE a SMALLINT DEFAULT p_year % 19; DECLARE b SMALLINT DEFAULT FLOOR(p_year / 100); DECLARE c SMALLINT DEFAULT p_year % 100; DECLARE d SMALLINT DEFAULT FLOOR(b / 4); DECLARE e SMALLINT DEFAULT b % 4; DECLARE f SMALLINT DEFAULT FLOOR((b + 8) / 25); DECLARE g SMALLINT DEFAULT FLOOR((b - f + 1) / 3); DECLARE h SMALLINT DEFAULT (19*a + b - d - g + 15) % 30; DECLARE i SMALLINT DEFAULT FLOOR(c / 4); DECLARE k SMALLINT DEFAULT c % 4; DECLARE L SMALLINT DEFAULT (32 + 2*e + 2*i - h - k) % 7; DECLARE m SMALLINT DEFAULT FLOOR((a + 11*h + 22*L) / 451); DECLARE v100 SMALLINT DEFAULT h + L - 7*m + 114; RETURN STR_TO_DATE( CONCAT(p_year, '-', v100 DIV 31, '-', (v100 % 31) + 1) , '%Y-%c-%e' ); END; Blog: http://rpbouman.blogspot.com/ 19 twitter: @rolandbouman
  20. 20. Matching expression and variable data types ● Multiple expression of this form: DECLARE b SMALLINT DEFAULT FLOOR(p_year / 100); ● Divide and round to next lowest integer – Alternative: using integer division (DIV) DECLARE b SMALLINT DEFAULT p_year DIV 100; ● 13x performance increase! – ...but: beware for negative values Blog: http://rpbouman.blogspot.com/ 20 twitter: @rolandbouman
  21. 21. Improved easter function: CREATE FUNCTION f_easter_int_nodiv( p_year INT ) RETURNS DATE BEGIN DECLARE a SMALLINT DEFAULT p_year % 19; DECLARE b SMALLINT DEFAULT p_year DIV 100; DECLARE c SMALLINT DEFAULT p_year % 100; DECLARE d SMALLINT DEFAULT b DIV 4; DECLARE e SMALLINT DEFAULT b % 4; DECLARE f SMALLINT DEFAULT (b + 8) DIV 25; DECLARE g SMALLINT DEFAULT (b - f + 1) DIV 3; DECLARE h SMALLINT DEFAULT (19*a + b - d - g + 15) % 30; DECLARE i SMALLINT DEFAULT c DIV 4; DECLARE k SMALLINT DEFAULT c % 4; DECLARE L SMALLINT DEFAULT (32 + 2*e + 2*i - h - k) % 7; DECLARE m SMALLINT DEFAULT (a + 11*h + 22*L) DIV 451; DECLARE v100 SMALLINT DEFAULT h + L - 7*m + 114; RETURN STR_TO_DATE( CONCAT(p_year, '-', v100 DIV 31, '-', (v100 % 31) + 1) , '%Y-%c-%e' ); END; ● 30% faster than using FLOOR and / ● Also applicable to regular SQL Blog: http://rpbouman.blogspot.com/ 21 twitter: @rolandbouman
  22. 22. Variable and assignment Summary ● Don't use user-defined variables – Use local variables instead ● If possible, use DEFAULT – If you don't, time is wasted ● Beware of SELECT INTO – Only use it for assigning values from queries – Use SET instead for assigning literals ● Match expression and variable data type Blog: http://rpbouman.blogspot.com/ 22 twitter: @rolandbouman
  23. 23. Program ● Stored routine Issues? ● Variables and assignments ● Flow of control ● Cursor handling ● Summary Blog: http://rpbouman.blogspot.com/ 23 twitter: @rolandbouman
  24. 24. Flow of Control ● Decisions, alternate code paths ● Plain SQL operators and functions: – IF(), CASE...END – IFNULL(), NULLIF(), COALESCE() – ELT(), FIELD(), FIND_IN_SET() ● Stored routine statements: – IF...END IF – CASE...END CASE Blog: http://rpbouman.blogspot.com/ 24 twitter: @rolandbouman
  25. 25. Case operator vs Case statement CREATE FUNCTION CREATE FUNCTION f_case_operator( f_case_statement( p_arg INT p_arg INT ) ) RETURNS INT RETURNS INT BEGIN BEGIN DECLARE a CHAR(1); DECLARE a CHAR(1); SET a := CASE p_arg CASE p_arg WHEN 1 THEN 'a' WHEN 1 THEN SET a := 'a'; WHEN 2 THEN 'b' WHEN 2 THEN SET a := 'b'; WHEN 3 THEN 'c' WHEN 3 THEN SET a := 'c'; WHEN 4 THEN 'd' WHEN 4 THEN SET a := 'd'; WHEN 5 THEN 'e' WHEN 5 THEN SET a := 'e'; WHEN 6 THEN 'f' WHEN 6 THEN SET a := 'f'; WHEN 7 THEN 'g' WHEN 7 THEN SET a := 'g'; WHEN 8 THEN 'h' WHEN 8 THEN SET a := 'h'; WHEN 9 THEN 'i' WHEN 9 THEN SET a := 'i'; ELSE NULL ELSE NULL END; END; RETURN NULL; RETURN NULL; END; END; Blog: http://rpbouman.blogspot.com/ 25 twitter: @rolandbouman
  26. 26. Case operator vs Case statement ● linear slowdown of the CASE statement 30 25 20 15 case operator case statement 10 5 0 1 2 3 4 5 6 7 8 9 10 argument 1 2 3 4 5 6 7 8 9 10 case operator 9,27 9,31 9,33 9,33 9,36 9,38 9,36 9,36 9,36 9,05 case statement 10,2 11,55 12,83 14,14 15,45 16,75 18,09 19,41 20,75 24,83 Blog: http://rpbouman.blogspot.com/ 26 twitter: @rolandbouman
  27. 27. Flow of control summary ● Use conditional expressions if possible Blog: http://rpbouman.blogspot.com/ 27 twitter: @rolandbouman
  28. 28. Program ● Stored routine Issues? ● Variables and assignments ● Flow of control ● Cursor handling ● Summary Blog: http://rpbouman.blogspot.com/ 28 twitter: @rolandbouman
  29. 29. Cursor Handling ● Why do you need that cursor anyway? ● Only very few cases justify cursors – Data driven stored procedure calls – Data driven dynamic SQL Blog: http://rpbouman.blogspot.com/ 29 twitter: @rolandbouman
  30. 30. You need a cursor to do what?! CREATE FUNCTION f_film_categories(p_film_id INT) SELECT fc.film_id RETURNS VARCHAR(2048) , GROUP_CONCAT(c.name) BEGIN FROM film_category fc DECLARE v_done BOOL DEFAULT FALSE; LEFT JOIN category c DECLARE v_category VARCHAR(25); ON fc.category_id = c.category_id DECLARE v_categories VARCHAR(2048); GROUP BY fc.film_id DECLARE film_categories CURSOR FOR SELECT c.name 35 FROM sakila.film_category fc 30 N=100000 INNER JOIN sakila.category c ON fc.category_id = c.category_id 25 WHERE fc.film_id = p_film_id; 20 DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_done := TRUE; 15 Row 2 OPEN film_categories; 10 categories_loop: LOOP FETCH film_categories INTO v_category; 5 IF v_done THEN 0 CLOSE film_categories; group_concat cursor LEAVE categories_loop; END IF; group_concat cursor SET v_categories := CONCAT_WS( ',', v_categories, v_category 15,34 29,57 ); END LOOP; RETURN v_categories; END; Blog: http://rpbouman.blogspot.com/ 30 twitter: @rolandbouman
  31. 31. Cursor Looping REPEAT, WHILE, LOOP ● Loop control ● What's inside the loop? – Treat nested cursor loops as suspicious – Be very weary of SQL statements inside the loop. Blog: http://rpbouman.blogspot.com/ 31 twitter: @rolandbouman
  32. 32. Why to avoid cursor loops with REPEAT ● Always runs at least once – So what if the set is empty? ● Iteration before checking the loop condition – Always requires an additional explicit check inside the loop ● Loop control scattered: – Both in top and bottom of the loop Blog: http://rpbouman.blogspot.com/ 32 twitter: @rolandbouman
  33. 33. Why to avoid cursor loops with REPEAT BEGIN DECLARE v_done BOOL DEFAULT FALSE; DECLARE csr FOR SELECT * FROM tab; Loop is entered, DECLARE CONTINUE HANDLER FOR NOT FOUND without checking if the SET v_done := TRUE; resultset is empty OPEN csr; REPEAT FETCH csr INTO var1,...,varN; 1 positve and one IF NOT v_done THEN negative check to see -- ... do stuff... if he resultset is END IF; exhausted; UNTIL v_done END REPEAT; CLOSE csr; END; Blog: http://rpbouman.blogspot.com/ 33 twitter: @rolandbouman
  34. 34. Why to avoid cursor loops with WHILE ● Slightly better than REPEAT – Only one check at the top of the loop ● Requires code duplication – One FETCH needed outside the loop ● Loop control still scattered – condition is checked at the top of the loop – FETCH required at the bottom Blog: http://rpbouman.blogspot.com/ 34 twitter: @rolandbouman
  35. 35. Why to avoid cursor loops with WHILE BEGIN DECLARE v_has_rows BOOL DEFAULT TRUE; DECLARE csr FOR SELECT * FROM tab; DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_has_rows := FALSE; OPEN csr; Fetch required both FETCH csr INTO var1,...,varN; outside (just once) and WHILE v_has_rows DO inside the loop -- ... do stuff... FETCH csr INTO var1,...,varN; END WHILE; CLOSE csr; END; Blog: http://rpbouman.blogspot.com/ 35 twitter: @rolandbouman
  36. 36. Why to write cursor loops with LOOP ● No double checking (like in REPEAT) ● No code duplication (like in WHILE) ● All loop control code in one place – All at top of loop Blog: http://rpbouman.blogspot.com/ 36 twitter: @rolandbouman
  37. 37. Why you should write cursor loops with LOOP BEGIN DECLARE v_done BOOL DEFAULT FALSE; DECLARE csr FOR SELECT * FROM tab; DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_done := TRUE; OPEN csr; my_loop: LOOP FETCH csr INTO var1,...,varN; IF v_done THEN CLOSE csr; LEAVE my_loop; END IF; -- ... do stuff... END LOOP; END; Blog: http://rpbouman.blogspot.com/ 37 twitter: @rolandbouman
  38. 38. Cursor summary ● Avoid cursors if you can – Use GROUP_CONCAT for lists – Use joins, not nested cursors – Only for data driven dynamic SQL and stored procedure calls ● Use LOOP instead of REPEAT and WHILE – REPEAT requires double condition checking – WHILE requires code duplication – LOOP allows you to keep all loop control together Blog: http://rpbouman.blogspot.com/ 38 twitter: @rolandbouman
  39. 39. Program ● Stored routine Issues? ● Variables and assignments ● Flow of control ● Cursor handling ● Summary Blog: http://rpbouman.blogspot.com/ 39 twitter: @rolandbouman
  40. 40. Summary ● Variables – Use local rather than user-defined variables ● Assignments – Use DEFAULT and SET for simple values – Use SELECT INTO for queries ● Flow of Control – Use functions and operators rather than statements ● Cursors – Avoid if possible – Use LOOP, not REPEAT and WHILE Blog: http://rpbouman.blogspot.com/ 40 twitter: @rolandbouman

×