This presentation contains a wide variety of PL/SQL performance best practicesand tuning techniques, but we shouldn’t just pick one of these at random! Instead,our first step should be to identify the most resource-intensive lines ofPL/SQL code and start by optimizing that code.To do this, we use the PL/SQL profiler. The profiler is implemented in thepackage DBMS_PROFILER. When we surround a program call with START_PROFILER and STOP_PROFILER calls, Oracle collates execution statistics on aline-by-line basis.
The profiling data is stored in a collection of tables prefixed withPLSQL_PROFILER. The above is a query that reports the most expensivelines of code (in terms of execution time) in the profiling run
Much of the time identifying the most-expensive lines of code is sufficient to discoverhot spots and tuning opportunities. But on other occasions you need toidentify expensive subroutines, or identify the calling routine to understand the context in which a line of code is being executed. To help with these scenarios,Oracle introduced the hierarchical profiler in Oracle 11g.You access this profiler via the DBMS_HPROF package. START_PROFILING and STOP PROFILING procedures commence and terminate the profilingrun. The output from the profiling session will be written to the external file identifiedin the START_PROFILING call. If you want to load this file into database tablesfor analyses, you can do so by using the ANALYZE procedure.In this example, we profile the NIGHTLY_BATCH procedure to an externalfile hprof_trace.trc that is created in the HPROF_DIR directory. We then load thetrace file into the profiling tables using the ANALYZE procedure
There are two ways to analyze the trace file. First, the plshprof commandline utility converts the trace file into an html report. We can view the report by pointing our browser at the hprof_report.htmlfile generated by the preceding command. The figure above shows the report.
Personally, I find the HTML report a bit hard to interpret and prefer to issueSQL against the profiler tables. In the above example, a hierarchical self-join isissued that exposes the call tree
Consider a scenario in which an application accepts input from the enduser, reads some data in the database, decides what statement to execute next, retrievesa result, makes a decision, executes some SQL, and so on. If the applicationcode is written entirely outside of the database, each of these steps would requirea network round trip between the database and the application. The timetaken to perform these network trips can easily dominate overall user responsetime.The right hand diagram shows the sequences of interactions that would be required withouta stored procedure approach.On the other hand, if a stored program is used to implement the fundstransfer logic, only a single database interaction is required. The stored programtakes responsibility for checking balances, withdrawal limits, and soon. The figure on the left illustrates stored procedure network traffic.
Network round trips can also become significant when an application is requiredto perform some kind of aggregate processing on large record sets in thedatabase. If the application needs to (for example) retrieve millions of rows to calculatesome sort of business metric that cannot easily be computed using nativeSQL (average time to complete an order, for example), a large number of roundtrips can result. In such a case, the network delay can again become the dominantfactor in application response time. Performing the calculations in a stored programwill reduce network overhead, which might reduce overall response time.The key determining factor will be the network latency
The further the code is (in network terms) from the database, the more the network effects will magnify. You can’t get any closer to the database than being inde the database as PL/SQL
For the majority of PL/SQL programs, the biggest gains will be obtained by tuning the SQL within the PL/SQL.Next most important is usually to ensure that PL/SQL handles data efficiently: by using array fetch and insert, by using the right types of collections and best practices such as associative array lookups and NOCOPY parameters. Thirdly one should look at the algorithmic efficiency of the PL/SQL code: loop structure, ordering of conditionals, avoiding recursion and so on.Usually the least gain is obtained by tweaking parameters or data types. Two exception are:If PLSQL_OPTIMIZE_LEVEL is <2 (10g) or 3 (11g) then increasing it can sometimes lead to significant improvementsWhen doing number crunching, data types and native compilation are recommended
Most parameters have little effect, but PLSQL_OPTIMIZE_LEVEL has major effect on PLSQL code, similar to the effects of an optimizing compiler in other languages. Level 0: No optimizationLevel 1: Minor optimizations, not much reorganizationLevel 2: (the default) Significant reorganization including loop optimizations and automatic bulk collectLevel 3: (11g only) Further optimizations, notably automatic in-lining of subroutines
This script shows the amount of times routines spent in PL/SQL code compared to total execution timeThe procedure shown spends only 3% of it’s time executing PL/SQL code: the vast bulk of time is spent by SQL embedded within the PL/SQL.
If you are a TOAD or SQL*Navigator user, consider the Xpert editions or the Development suites. These include SQL optimizer which can help you find and tune SQL
PL/SQL can fetch rows from the database one at a time, all at once or in batches. The more rows fetched in a batch, the lower the CPU / logical read overhead, but the greater the memory requirementsThe forth way is not to retrieve it at all. I.e. Can you do all you work in SQL without pulling the rows into PL/SQL?
Prior to 10g this is inefficient:*Excessive loop iterations Increases logical reads (rows in the same block fetched separately)In other languages, this pattern causes excessive network traffic, but luckily in PL/SQL this is not an issue.
Selects all data in a single operationLarge result sets might take longer as memory growsOther concurrent sessions may have limited memory for sorts, etc. Out of memory errors are possibleIf you are connected via MTS and have Auto Shared Memory Management (ASMM) or Automatic Memory Management (AMM) you can actually use up all of the PGA and SGA memory.
Considered best:Never more that p_array_size elements in collectionBest throughput, acceptable memory utilization
Note that this example is also vulnerable to SQL injection. Bind variables enable SQLs that are essentially identical, differing onlyin parameter values, to be parsed only once and then executed many times. Usingbind variables reduces parse overhead—a CPU-intensive operation—and also reducescontention for latches and mutexes that protect shared SQL structures inthe shared pool.In most programming languages, we have to go to special effort to use bindvariables, and sometimes the convoluted code that results can be tedious to writeand hard to maintain. However in PL/SQL, bind variables are employed automatically:Every PL/SQL variable inside a SQL statement is effectively a bindvariable and, therefore, PL/SQL programs rarely suffer from parse overhead andassociated latch/mutex contention issues that are all too common in languagessuch as PHP, Java, or C#.However, when we use dynamic SQL in PL/SQL, this automatic bindingdoes not occur: Dynamically constructed SQL in PL/SQL is just
This implementation defines the bind variable placeholder as :columnValuein the dynamic SQL string. The actual value is provided by the USING clause.Although we still request the parse every time this routine is executed, Oraclewill quite possibly find a matching SQL in the shared pool, providing onlythat the same table and column names have been used in a previous execution.
As well as this performance gain, we reduce the chance of mutex contention if there are other sessions are trying to execute similar routines.
Users of other programming languages might be familiar with the concept ofpassing a parameter by value as opposed to by reference. When we pass a parameterby value, we create a copy of the parameter for the subroutine to use. Whenwe pass the parameter by reference, the subroutine uses the actual variablepassed; any changes made to the variable in the subroutine are visible to the callingroutine.The NOCOPY directive instructs PL/SQL function or procedure to use theparameter variable directly, by reference, rather than making a copy. This is animportant optimization when passing large PL/SQL collections into a subroutinebecause otherwise the process of creating a copy can consume significant resources.
Prior to Oracle 9.2, when seeking a non-numeric value in such a cache, we might use two collections: oneof which contained keys and the other which contained values. For instance, inthe following example, we scan through the G_CUST_NAMES table looking for aspecific customer name and date of birth. birth. If we find a match, we look in the correspondingindex of the G_CUST_IDS table to find the CUSTOMER_ID.
Associative arrays offer a more efficient solution. An associativearray might be indexed by a non-numeric variable; so we can look up thematching customer name directly. And the code is simpler.
The LOOP–END LOOP clauses repeatedly execute statements within a loop. Becausestatements within a loop are executed many times, they can often consumea high proportion of overall execution time. A badly constructed loop can have adrastic effect on performance.The code above illustrates the principle of exiting a loop as early as possible.The code calculates the number of prime numbers less than the numberprovided as a parameter (P_NUM). It does this by attempting to divide the numberby every number smaller than the parameter.I originally wrote this code many years ago when comparing Java andPL/SQL performance. In the first version of this code, I omitted to include theEXIT statement included as a comment in line 16. The program worked but performedpoorly because it kept looping when it did not need to. NB: PLSQL_OPTIMIZE_LEVEL cannot help here.
Loop invariant expressions are those that don’t change with each iteration ofthe loop. For instance in the following code, the expressions on lines 5 and 7 remainunchanged with each iteration of the loop that begins on line 3. Recalculatingthese expressions with each iteration of the loop is unnecessary and wastesCPU cycles.
A recursive routine is one that invokes itself. Recursive routines often offer elegantsolutions to complex programming problems but tend to consume large amountsof memory and to be less efficient than iterative—loop based—alternatives.
Remember, the memory youscomes from a pool of memory shared between sessions. Other sessions might have reduced access to memory.
So far, we have used the Oracle NUMBER data type when performing numericcomputation. The NUMBER type is extremely flexible and capable of storing bothhigh-precision and high-magnitude numbers. However, this flexibility comes at acost when performing numeric computations: Certain numeric calculations willbe faster if a less flexible data type is chosen.In particular, the PLS_INTEGER and SIMPLE_INTEGER data types performfaster than the NUMBER type for computation. Both are signed 32 bit integers,which means that they can store numbers between –2,147,483,648 and2,147,483,648. SIMPLE_INTEGER is the same as PLS_INTEGER but cannot beNULL and overflows (attempts to store numbers larger than 2,147,483,648, for instance)will not cause exceptions to be raised. SIMPLE_INTEGER can offer a performanceadvantage when the PL/SQL package is compiled to native code.
Oracle enables you to create stored procedures in the Java language. Java-storedprocedures can outperform PL/SQL for number crunching, though the advantagesof Java for computation have been steadily decreasing with each release of Oracle.When Java was first introduced, performance gains of anywhere between 10 to 100times were achievable when rewriting computationally intensive PL/SQL routinesin Java. However, improvements in PL/SQL language efficiency, including someof the optimizations outlined in this chapter, have closed the gap.As with all computational optimizations, using Java to optimize numbercrunching operations is not generally advisable for PL/SQL programs that aredatabase-intensive. Efforts to optimize math operations for a PL/SQL programthat does mainly database operations are probably misdirected.Furthermore, a lot of the advantages that Java had over PL/SQL in previousreleases can be overcome in Oracle 11g by using efficient data types, SIMPLE_INTEGER, for example, and native compilation.
This example illustrates a drawback of function caching. The query is not strictly deterministic, and should be flushed on a daily basis. Oracle 11g introduced the result set cache, which allows entire result sets to becached in memory. SQL statements that perform expensive operations on relativelystatic tables can benefit tremendously from the result set cache.The function cache is a related facility that can benefit PL/SQL routines orSQL statements that call PL/SQL functions. Oracle 11g can store the results of a PL/SQL function in memory and, if the function is expensive to resolve, can retrievethe results for the function from memory more efficiently than by reexecutingthe function.You might want to use the function cache in the following circumstances:❏ You have a computationally expensive function that is deterministic: It willalways return the same results given the same inputs.❏ You have a database access routine encapsulated in a function that accessestables that are infrequently updated.
Einstein’s famous equation is encapsulated in the MASS_TO_ENERGY subroutinein lines 1–9; this subroutine is called multiple times from within the EMC2Bprocedure at line 17. This encapsulation represents good programming practice,especially if Einstein’s equation needs to be called from other routines. (Perhaps thispackage will be utilized in Larry Ellison’s upcoming intergalactic yacht.) However,the subroutine calls add overhead, and from a performance point of view, it wouldprobably be better to include the equation directly within the calling routine, like this:
With 11g in-lining, you can write your code using modularity and encapsulationwithout paying a performance penalty because Oracle can automatically moverelevant subroutines in-line to improve performance. The optimizer will performsome in-lining automatically if PLSQL_OPTIMZE_LEVEL=3. If you want to performin-lining when PLSQL_OPTIMIZE_LEVEL=2 (the default in Oracle 11g) oryou want to increase the likelihood of in-lining when PLSQL_OPTIMIZE_LEVEL=3.
Some of these and other topics are covered in Chapter 12 of my book “Oracle Performance Survival Guide”.
It’s usually the SQL
• Most PL/SQL routines spend most of their time executing
SELECT statements and DML
• SQL tuning is a big topic but:
– Measure SQL overhead of PL/SQL routines first
– Ensure best possible optimizer statistics
– Consider adequacy of indexing
– Learn how to use DBMS_XPLAN, SQL Trace, etc
– Exploit 10g/11g tuning facilities (if licensed)
– Don’t issue SQL when you don’t need to
SQL or PL/SQL?
cachedPlsql.sql at guyharrison.net/opsg 19
Reduce unnecessary Looping
• Unnecessary loop iterations burn CPU
Poorly formed loop
Well formed loop 3.96
0 5 10 15 20 25 30 35
Elapsed time (s)
Remove loop Invariant terms
• Any term in a loop that does not vary should be extracted
from the loop
PLSQL_OPTIMIZE_LEVEL>1 does this automatically
Loop invariant performance improvements
Original loop 11.09
Optimized loop 5.87
0 2 4 6 8 10 12
Elapsed time (s)
Recursion (see: recursion)
• Recursive routines often offer elegant solutions*.
• However, deep recursion is memory-intensive and
usually not scalable
* Known In Australia as “smart-ass solutions” 43
Number crunching (1)
0 5 10 15 20 25
Elasped Time (s)
Number crunching (2)
PLSQL interpreted, NUMBER data type
PLSQL interpreted, PLS_INTEGER data type 14.48
PLSQL compiled, SIMPLE_INTEGER datatype 0.74
Java stored procedure 0.11
0 5 10 15 20 25 30 35 40 45 50
Elapsed time (s)
11g Function cache
• Suits deterministic but expensive functions OR
• Expensive table lookups on non-volatile tables
Function cache performance
• 100 executions, random date ranges 1-30 days:
No function cache 5.21
Function cache 1.51
0 1 2 3 4 5 6
Elapsed time (s)