MySQL Stored Procedures: Building High Performance Web Applications


Published on

MySQL Stored Procedures: Building High Performance Web Applications

Talk given by Sonali at GNUnify 2010

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Not only is the recursive algorithm less efficient for almost any given input value, it also degrades rapidly as the number of recursions increases (which is, in turn, dependent on which element of the Fibonacci sequence is requested). As well as being inherently a less efficient algorithm, each recursion requires MySQL to create the context for a new stored program (or function) invocation. As a result, recursive algorithms tend to waste memory as well as being slower than their iterative alternatives.
  • MySQL Stored Procedures: Building High Performance Web Applications

    1. 1. MySQL Stored Procedures Building High Performance Web Applications by Sonali Minocha, OSSCube
    2. 2. Who Am I?
    3. 3. ● Programs as Database Schema Objects ● Executed in-process with the Database ● Types of Stored Routines: ● Procedures ● Functions ● Triggers ● Events (Temporal triggers; new in MySQL 5.1) ● Language: ● Subset of Standard SQL:2003 SQL/PSM ● Procedural, Block structured ● Do not confuse with User Defined Functions (UDF)! Overview
    4. 4. ● Stored Procedures & Functions ● Encapsulate tasks or Calculations for reuse ● Single point of definition for Business Logic ● Source Safely stored and backed up ● Added layer of Security ● Triggers ● Data-Driven ● Enforce Data quality through Basic validation ● Enforce complex Business Rules ● Automatically Update Aggregate tables ● Events (MySQL Server 5.1 beta) ● Schedule Code Execution in time. ● Use instead of cron or windows event scheduler ● Automatically Update Aggregate tables
    5. 5. ● Performance ● Save network roundtrips, lower latency ● Portability and Reuse ● Single point of definition ● Reusable from many application contexts ● Security ● DEFINER versus INVOKER ● Grant only Execution Privilege ● Ease of Maintenance ● Code stored in the database ● Browse using information_schema database ● 'Headless' administrative tasks ● No additional runtime environment required Advantages
    6. 6. ● Performance ● Overhead may result in higher latency ● Increased usage of database server computing power may negatively affect throughput Disadvantages
    7. 7. How Fast Is the Stored Program Language?
    8. 8. Stored program to find prime numbers CREATE PROCEDURE sp_nprimes(p_num int) BEGIN DECLARE i INT; DECLARE j INT; DECLARE nprimes INT; DECLARE isprime INT; SET i=2; SET nprimes=0; main_loop: WHILE (i<p_num) do SET isprime=1; SET j=2; divisor_loop: WHILE (j<i) DO IF (MOD(i,j)=0) THEN SET isprime=0; LEAVE divisor_loop; END IF; SET j=j+1; END WHILE ; IF (isprime=1) THEN SET nprimes=nprimes+1; END IF; SET i=i+1; END WHILE; SELECT CONCAT(nprimes,' prime numbers less than ',p_num); END;
    9. 9. Oracle implementation of the prime number procedure PROCEDURE N_PRIMES ( p_num NUMBER) IS i INT; j INT; nprimes INT; isprime INT; BEGIN i:=2; nprimes:=0; <<main_loop>> WHILE (i<p_num) LOOP isprime:=1; j:=2; <<divisor_loop>>
    10. 10. Oracle implementation of the prime number procedure(cont.) WHILE (j<i) LOOP IF (MOD(i,j)=0) THEN isprime:=0; EXIT divisor_loop; END IF; j:=j+1; END LOOP ; IF (isprime=1) THEN nprimes:=nprimes+1; END IF; i:=i+1; END LOOP; dbms_output.put_line(nprimes||' prime numbers less than '||p_num); END;
    11. 11. Finding prime numbers in various languages
    12. 12. The MySQL stored program language is relatively slow when it comes to performing arithmetic calculations. Avoid using stored programs to perform number crunching.
    13. 13. Feeling less enthusiastic about stored program performance ?????
    14. 14. Reducing Network Traffic with Stored Programs
    15. 15. Stored program to generate statistics CREATE PROCEDURE sales_summary( ) READS SQL DATA BEGIN DECLARE SumSales FLOAT DEFAULT 0; DECLARE SumSquares FLOAT DEFAULT 0; DECLARE NValues INT DEFAULT 0; DECLARE SaleValue FLOAT DEFAULT 0; DECLARE Mean FLOAT; DECLARE StdDev FLOAT; DECLARE last_sale INT DEFAULT 0; DECLARE sale_csr CURSOR FOR SELECT sale_value FROM SALES s WHERE sale_date >date_sub(curdate( ),INTERVAL 6 MONTH); DECLARE CONTINUE HANDLER FOR NOT FOUND SET last_sale=1; OPEN sale_csr; sale_loop: LOOP FETCH sale_csr INTO SaleValue; IF last_sale=1 THEN LEAVE sale_loop; END IF;
    16. 16. Stored program to generate statistics ( cont.) SET NValues=NValues+1; SET SumSales=SumSales+SaleValue; SET SumSquares=SumSquares+POWER(SaleValue,2); END LOOP sale_loop; CLOSE sale_csr; SET StdDev = SQRT((SumSquares - (POWER(SumSales,2) / NValues)) / NValues); SET Mean = SumSales / NValues; SELECT CONCAT('Mean=',Mean,' StdDev=',StdDev); END
    17. 17. Java program to generate sales statistics import java.sql.*; import java.math.*; public class SalesSummary { public static void main(String[] args) throws ClassNotFoundException, InstantiationException, IllegalAccessException { String Username=args[0]; String Password=args[1]; String Hostname=args[2]; String Database=args[3]; String Port=args[4]; float SumSales,SumSquares,SaleValue,StdDev,Mean; int NValues=0; SumSales=SumSquares=0; try { Class.forName(&quot;com.mysql.jdbc.Driver&quot;).newInstance( ); String ConnString= &quot;jdbc:mysql://&quot;+Hostname+&quot;:&quot;+Port+ &quot;/&quot;+Database+&quot;?user=&quot;+Username+&quot;&password=&quot;+Password; Connection MyConnect = DriverManager.getConnection(ConnString); Statement s1=MyConnect.createStatement( ); ResultSet rs1=s1.executeQuery(sql);
    18. 18. Java program to generate sales statistics(cont.) String sql=&quot;select sale_value from SALES s&quot; + &quot; where sale_date >date_sub(curdate( ),interval 6 month)&quot;; Statement s1=MyConnect.createStatement( ); ResultSet rs1=s1.executeQuery(sql); while ( )) { SaleValue = rs1.getFloat(1); NValues = NValues + 1; SumSales = SumSales + SaleValue; SumSquares = SumSquares + SaleValue*SaleValue; } rs1.close( ); Mean=SumSales/NValues; StdDev = (float) Math.sqrt(((SumSquares - ((SumSales*SumSales) / NValues)) / NValues)); System.out.println(&quot;Mean=&quot;+Mean+&quot; StdDev=&quot;+StdDev+&quot; N=&quot;+NValues); } catch(SQLException Ex) { System.out.println(Ex.getErrorCode()+&quot; &quot;+Ex.getMessage( )); Ex.printStackTrace( );} } }
    19. 19. Java versus stored program performance across the network
    20. 20. Stored programs do not incur the network overhead of languages such as PHP or Java. If network overhead is an issue, then using a stored program can be an effective optimization.
    21. 21. Stored Programs as an Alternative to Expensive SQL
    22. 22. Avoid Self-Joins with Procedural Logic Finding the maximum sale for each customer SELECT s.customer_id,s.product_id,s.quantity, s.sale_value FROM sales s, (SELECT customer_id,max(sale_value) max_sale_value FROM sales GROUP BY customer_id) t WHERE t.customer_id=s.customer_id AND t.max_sale_value=s.sale_value AND s.sale_date>date_sub(curdate( ),interval 6 month); we first need to create a temporary table to hold the customer ID and maximum sale value and then join that back to the sales table to find the full details for each of those rows.
    23. 23. Stored program to return maximum sales for each customer over the last 6 months CREATE PROCEDURE sp_max_sale_by_cust( ) MODIFIES SQL DATA BEGIN DECLARE last_sale INT DEFAULT 0; DECLARE l_last_customer_id INT DEFAULT -1; DECLARE l_customer_id INT; DECLARE l_product_id INT; DECLARE l_quantity INT; DECLARE l_sale_value DECIMAL(8,2); DECLARE counter INT DEFAULT 0; DECLARE sales_csr CURSOR FOR SELECT customer_id,product_id,quantity, sale_value FROM sales WHERE sale_date>date_sub(currdate( ),interval 6 month) ORDER BY customer_id,sale_value DESC; DECLARE CONTINUE HANDLER FOR NOT FOUND SET last_sale=1; OPEN sales_csr; sales_loop: LOOP FETCH sales_csr INTO l_customer_id,l_product_id,l_quantity,l_sale_value; IF (last_sale=1) THEN LEAVE sales_loop; END IF; ** ** IF l_customer_id <> l_last_customer_id THEN /* This is a new customer so first row will be max sale*/ INSERT INTO max_sales_by_customer (customer_id,product_id,quantity,sale_value) VALUES(l_customer_id,l_product_id,l_quantity,l_sale_value); END IF; SET l_last_customer_id=l_customer_id; END LOOP; END we can use a stored program to retrieve the data in a single pass through the sales table
    24. 24. Using a stored program to optimize a complex self-join
    25. 25. Optimize Correlated Updates Correlated UPDATE statement UPDATE customers c SET sales_rep_id = (SELECT manager_id FROM employees WHERE surname = c.contact_surname AND firstname = c.contact_firstname AND date_of_birth = c.date_of_birth) WHERE (contact_surname, contact_firstname, date_of_birth) IN (SELECT surname, firstname, date_of_birth FROM employees and ); Here employee table is accessed twice
    26. 26. Stored program alternative to the correlated update CREATE PROCEDURE sp_correlated_update( ) MODIFIES SQL DATA BEGIN DECLARE last_customer INT DEFAULT 0; DECLARE l_customer_id INT ; DECLARE l_manager_id INT; DECLARE cust_csr CURSOR FOR select c.customer_id,e.manager_id from customers c, employees e where e.surname=c.contact_surname and e.firstname=c.contact_firstname and e.date_of_birth=c.date_of_birth; DECLARE CONTINUE HANDLER FOR NOT FOUND SET last_customer=1;
    27. 27. Stored program alternative to the correlated update(cont.) OPEN cust_csr; cust_loop: LOOP FETCH cust_csr INTO l_customer_id,l_manager_id; IF (last_customer=1) THEN LEAVE cust_loop; END IF; UPDATE customers SET sales_rep_id=l_manager_id WHERE customer_id=l_customer_id; END LOOP; END; Here table is only accessed once and cursor is used to store data
    28. 28. Performance of a correlated update and stored program alternative
    29. 29. Optimizing Loops : Move Unnecessary Statements Out of a Loop A poorly constructed loop WHILE (i<=1000) DO SET j=1; WHILE (j<=5000) DO SET rooti=sqrt(i); SET rootj=sqrt(j); SET sumroot=sumroot+rooti+rootj; SET j=j+1; END WHILE; SET i=i+1; END WHILE; There are 1000 different values of i, however because square root is calculated inside j loop . It is calculated 1000*5000 ie 5 million times
    30. 30. Moving unnecessary calculations out of a loop WHILE (i<=1000) DO SET rooti=sqrt(i); SET j=1; WHILE (j<=5000) DO SET rootj=sqrt(j); SET sumroot=sumroot+rootj+rooti; SET j=j+1; END WHILE; SET i=i+1; END WHILE;
    31. 31. Performance improvements gained from removing unnecessary calculations within a loop
    32. 32. Ensure that all statements within a loop truly belong within the loop. Move any loop-invariant statements outside of the loop.
    33. 33. Use LEAVE or CONTINUE to Avoid Needless Processing Loop that iterates unnecessarily divisor_loop: WHILE (j<i) do /* Look for a divisor */ IF (MOD(i,j)=0) THEN SET isprime=0; /* This number is not prime*/ END IF; SET j=j+1; END WHILE ;
    34. 34. Loop with a LEAVE statement to avoid unnecessary iterations divisor_loop: WHILE (j<i) do /* Look for a divisor */ IF (MOD(i,j)=0) THEN SET isprime=0; /* This number is not prime*/ LEAVE divisor_loop; /* No need to keep checking*/ END IF; SET j=j+1; END WHILE ;
    35. 35. Modifying the WHILE condition to avoid unnecessary iterations divisor_loop: WHILE (j<i AND isprime=1) do /* Look for a divisor */ IF (MOD(i,j)=0) then SET isprime=0; /* This number is not prime*/ END IF; SET j=j+1; END WHILE ;
    36. 36. Effect of using LEAVE or modifying WHILE clause to avoid unnecessary iterations
    37. 37. Make sure that your loops terminate when all of the work has been done, either by ensuring that the loop continuation expression is comprehensive or if necessary by using a LEAVE statement to terminate when appropriate.
    38. 38. IF and CASE Statements Test for the Most Likely Conditions First Poorly constructed IF statement IF (percentage>95) THEN SET Above95=Above95+1; ELSEIF (percentage >=90) THEN SET Range90to95=Range90to95+1; ELSEIF (percentage >=75) THEN SET Range75to89=Range75to89+1; ELSE SET LessThan75=LessThan75+1; END IF; Optimized IF statement IF (percentage<75) THEN SET LessThan75=LessThan75+1; ELSEIF (percentage >=75 AND percentage<90) THEN SET Range75to89=Range75to89+1; ELSEIF (percentage >=90 and percentage <=95) THEN SET Range90to95=Range90to95+1; ELSE SET Above95=Above95+1; END IF;
    39. 39. Effect of optimizing an IF statement by reordering comparisons
    40. 40. If an IF statement is to be executed repeatedly, placing the most commonly satisfied condition earlier in the IF structure may optimize performance.
    41. 41. Avoid Unnecessary Comparisons IF statement with common condition in each expression IF (employee_status='U' AND employee_salary>150000) THEN SET categoryA=categoryA+1; ELSEIF (employee_status='U' AND employee_salary>100000) THEN SET categoryB=categoryB+1; ELSEIF (employee_status='U' AND employee_salary<50000) THEN SET categoryC=categoryC+1; ELSEIF (employee_status='U') THEN SET categoryD=categoryD+1; END IF; Nested IF statement to avoid redundant comparisons IF (employee_status='U') THEN IF (employee_salary>150000) THEN SET categoryA=categoryA+1; ELSEIF (employee_salary>100000) THEN SET categoryB=categoryB+1; ELSEIF (employee_salary<50000) THEN SET categoryC=categoryC+1; ELSE SET categoryD=categoryD+1; END IF; END IF;
    42. 42. Effect of nesting an IF statement to eliminate redundant comparisons
    43. 43. If your IF or CASE statement contains compound expressions with redundant comparisons, consider nesting multiple IF or CASE statements to avoid redundant processing.
    44. 44. CASE Versus IF CASE customer_code WHEN 1 THEN SET process_flag=7; WHEN 2 THEN SET process_flag=9; WHEN 3 THEN SET process_flag=2; ELSE SET process_flag=0; END CASE; IF customer_code= 1 THEN SET process_flag=7; ELSEIF customer_code= 2 THEN SET process_flag=9; ELSEIF customer_code=3 THEN SET process_flag=2; ELSE SET process_flag=0; The IF statement is roughly 15% faster than the equivalent CASE statemen presumably this is the result of a more efficient internal algorithm for IF in the MySQL code. END IF ;
    45. 45. Recursion Recursive implementation of the Fibonacci algorithm CREATE PROCEDURE rec_fib(n INT,OUT out_fib INT) BEGIN DECLARE n_1 INT; DECLARE n_2 INT; IF (n=0) THEN SET out_fib=0; ELSEIF (n=1) then SET out_fib=1; ELSE CALL rec_fib(n-1,n_1); CALL rec_fib(n-2,n_2); SET out_fib=(n_1 + n_2); END IF; END
    46. 46. Nonrecursive implementation of the Fibonacci sequence CREATE PROCEDURE nonrec_fib(n INT,OUT out_fib INT) BEGIN DECLARE m INT default 0; DECLARE k INT DEFAULT 1; DECLARE i INT; DECLARE tmp INT; SET m=0; SET k=1; SET i=1; WHILE (i<=n) DO SET tmp=m+k; SET m=k; SET k=tmp; SET i=i+1; END WHILE; SET out_fib=m; END
    47. 47. max_sp_recursion_depth Performance of recursive and nonrecursive calculations of Fibonacci numbers
    48. 48. Recursive solutions rarely perform as efficiently as their non recursive alternatives.
    49. 49. Cursors Two equivalent stored programs, one using INTO and the other using a cursor CREATE PROCEDURE using_into ( p_customer_id INT,OUT p_customer_name VARCHAR(30)) READS SQL DATA BEGIN SELECT customer_name INTO p_customer_name FROM customers WHERE customer_id=p_customer_id; END; CREATE PROCEDURE using_cursor (p_customer_id INT,OUT p_customer_name VARCHAR(30)) READS SQL DATA BEGIN DECLARE cust_csr CURSOR FOR SELECT customer_name FROM customers WHERE customer_id=p_customer_id; OPEN cust_csr; FETCH cust_csr INTO p_customer_name; CLOSE cust_csr; END;
    50. 50. Relative performance of INTO versus CURSOR fetch : over 11,000 executions, the INTO -based stored program was approximately 15% faster than the stored program that used an explicit cursor.
    51. 51. If you know that a SQL statement will return only one row, then a SELECT ... INTO statement will be slightly faster than declaring, opening, and fetching from a cursor.
    52. 52. Trigger Overhead &quot;Trivial&quot; trigger CREATE TRIGGER sales_bi_trg BEFORE INSERT ON sales FOR EACH ROW SET @x=NEW.sale_value; When we implemented this trigger for 100,000 sales row the time of execution increased by 45%.
    53. 53. A more complex trigger CREATE TRIGGER sales_bi_trg BEFORE INSERT ON sales FOR EACH ROW BEGIN DECLARE row_count INTEGER; SELECT COUNT(*) INTO row_count FROM customer_sales_totals WHERE customer_id=NEW.customer_id; IF row_count > 0 THEN UPDATE customer_sales_totals SET sale_value=sale_value+NEW.sale_value WHERE customer_id=NEW.customer_id; ELSE INSERT INTO customer_sales_totals (customer_id,sale_value) VALUES(NEW.customer_id,NEW.sale_value); END IF; END
    54. 54. This trigger increased the time of execution by 100 times Index to support our trigger CREATE UNIQUE INDEX customer_sales_totals_cust_id ON customer_sales_totals(customer_id )
    55. 55. Trigger performance variations
    56. 56. <ul><li>The optimization of stored program code follows the same general principles that are true for other languages. In particular: </li></ul><ul><li>Optimize loop processing: ensure that no unnecessary statements occur within a loop; exit the loop as soon as you are logically able to do so. </li></ul><ul><li>Reduce the number of comparisons by testing for the most likely match first, and nest IF or CASE statements when necessary to eliminate unnecessary comparisons. </li></ul><ul><li>Avoid recursive procedures. </li></ul><ul><li>Because MySQL triggers execute once for each row affected by a DML statement, the effect of any unoptimized statements in a trigger will be magnified during bulk DML operations. Trigger code needs to be very carefully optimized expensive SQL statements have no place in triggers. </li></ul>
    57. 57. Q & A
    58. 58. ...and we are hiring ^_^ For more information, please feel free to drop in a line to [email_address] or visit