Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Adapting and adopting spm v04

819 views

Published on

Adapting and adopting SQL Plan Management (SPM) to achieve execution plan stability for sub-second queries on a high-rate OLTP mission-critical application

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Adapting and adopting spm v04

  1. 1. Adapting and adopting SQL Plan Management to achieve execution plan stability for sub-second queries on a high-rate OLTP mission-critical application Carlos Sierra
  2. 2. Wide spikes are usually the signature of a “plan flip”
  3. 3. A “plan flip” on one SQL statement caused wide spikes
  4. 4. One SQL statement with 3 distinct Execution Plans
  5. 5. Plans flip frequently for this SQL statement
  6. 6. Multiple plans co-exist Another SQL statement with multiple overlapping Execution Plans
  7. 7. Motivation • Plan stability is more valuable than plan flexibility when • Strict SLAs in the order of milliseconds • Simple queries execute dozens of times per second • Out-of-the-box “Automatic SPM Evolve Task” is great … but • It may accept sub-optimal execution plans on a non-typical application • e.g.: binds captured could be outdated in a matter of hours • Historical plan performance could be used to determine future SQL performance with some degree of confidence • We just need to implement an autonomous custom algorithm …
  8. 8. About the application and environment • Oracle 12c multi-tenant on Oracle Servers X5-2 with NVMe SSD • OLTP application with copies in 30+ databases and 700+ PDBs • Row by row web-based custom application • Transaction isolation implemented through application-enforced serialization • Few critical queries encapsulated as “critical serial-path transaction” • Typical transaction executes in ~10ms including up to 10 queries • A “plan-flip” constantly risks breaching stringent milliseconds SLAs
  9. 9. A typical query SELECT … FROM SYSTEMS WHERE (id, TxnID, 1) IN ( SELECT id, TxnID, ROW_NUMBER() OVER ( PARTITION BY id ORDER BY TxnID DESC ) rn FROM SYSTEMS WHERE TxnID <= :1 ) AND Live = 'Y' AND ((compartmentId = :2)) ORDER BY compartmentId ASC, id ASC FETCH FIRST :3 ROWS ONLY
  10. 10. A typical execution plan
  11. 11. A typical performance scorecard
  12. 12. Adaptive SQL Plan Management on 12c • Refer to this link for details https://oracle-base.com/articles/12c/adaptive-sql-plan-management-12cr1 • Evolution of SPBs is “on” by default • View dba_advisor_parameters • Filter task_name = ‘SYS_AUTO_SPM_EVOLVE_TASK’ • Columns parameter_name and parameter_value • Look for ACCEPT_PLANS parameter • Last evolution: DBMS_SPM.report_auto_evolve_task • Creation of SPB is “off” by default
  13. 13. Automatic SPM Evolve Task Plan is evaluated using variable values captured at the time the test plan is created. The “evolve task” determines the base plan performs poorly when executed passing outdated values. This application has a fast moving time window.
  14. 14. Custom SPM Implementation Objectives • Reduce the number of incidents where the execution of a new plan causes a performance regression of an SLA related SQL statement • Create a SQL Plan Baseline (SPB) a.k.a. “pin a plan” when such a plan has a proven record of consistent good performance (i.e. learn from history or lack of history) • Ignore SQL statements that are too young • If a SQL statement changes, then re-learn from history and “pin a plan” once it becomes mature again • Flag a plan as permanent once its SPB has also matured • Clean up unwanted plans
  15. 15. FPZ Algorithm • Pre-select SQL_ID/PHV candidates, mainly from shared pool • If there exists a valid SQL Plan Baseline (SPB) for candidate • Demote SPB if underperforms (disable it) • Promote SPB after proven performance (fix it) • Else (no SPB exists for candidate) • Further screen candidate • Create SPB if candidate is accepted • Log decision
  16. 16. FPZ Algorithm
  17. 17. Pre-select SQL_ID/PHV candidates • For PHV, at least one child cursor is valid, shareable and not obsolete • Parent cursor’s first load time is > 6 days • SQL (parent cursor) is mature • Cursor has been active within the last 24 hours • Parsing user and schema is not SYS nor Oracle managed • PDB is not CDB$ROOT or PDB$SEED • PHV and Executions are > 0 • Some others
  18. 18. Consider Plan candidates from AWR, only if 1. Plan is not on Shared Pool • Plan was generated in the past (AWR) but not currently in memory 2. There are other Plans for SQL on Shared Pool (with no SPB) • SQL is active and has no SPB 3. Focus is on one SQL and not entire PDB or CDB • Algorithm skips AWR plans which are candidates from Shared Pool (because AWR does not store SPB name on SQLSTAT) Note: not having SPB name on AWR SQLSTAT would cause algorithm to re- create SPB on every execution
  19. 19. What is a valid SQL Plan Baseline (SPB)? • Accepted • Enabled • Reproduced • Not necessary Fixed
  20. 20. Disable SPB if underperforms • Cursor’s average elapsed time per execution > 10x category’s threshold • Cursor’s average elapsed time per execution > 100x SPB average elapsed time per execution • Evaluate after N executions (as per candidate threshold)
  21. 21. SPB demotion to “DISABLE” Cursor Cache Plans with SPB SPBs that qualify for a “DISABLE” demotion Enabled Accepted Reproduced Not Fixed Avg ET > 10x Max Category Threshold Avg ET > 100x SPB snapshot
  22. 22. SPB evaluation and conditional promotion • If not “fixed” and “created” > 14 days • Set “FIX” flag to YES • Plan is mature, in use and with acceptable performance • Note: after “fixed” no new plans are created into Plan History
  23. 23. SPB promotion to “FIX” Cursor Cache Plans with SPB SPBs that qualify for a “FIX” promotion Enabled Accepted Reproduced Not Fixed Age > 14d
  24. 24. Further screen SQL_ID/PHV candidates • Plan has > “X” executions • > 10,000 for some categories • > 1,000 for other categories • Hint: Start SPM Automation with high-rate SQL only • Plan’s average execution time is < “X”ms • < 0.5ms for some categories • < 10ms for other categories • Proven acceptable “on average” performance based on cumulative metrics • Or lack of historical metrics which usually denote a light-weight SQL
  25. 25. SPB creation Cursor Cache and AWR SQLSTAT Plan candidates for SPB Plans that qualify for a SPB Executions > 2,500 Elapsed Time per Execution < 10s Age > 4d Executions > 25,000 Elapsed Time per Execution < 1.25ms Age > 6d
  26. 26. Categorizing SQL statements • Use Module and Action, and/or parse SQL text • Critical transaction (e.g.) • Commit path • Begin transaction • Garbage collection • Non-critical transaction (e.g.) • Scan read • Something else (i.e.) • Categorize as non-application and possibly reject candidate
  27. 27. Further screen SQL_ID/PHV candidates (cont.) • Plan has no AWR performance history (low database load); or • Plan has AWR recent performance history (60 days) such as • Execution time’s 90th Percentile < 2x cursor’s category and < 20x cursor’s avg • e.g. < 2.5ms and < 20x avg • Execution time’s 95th Percentile < 3x cursor’s category and < 30x cursor’s avg • e.g. < 3.75ms and < 30x avg • Execution time’s 97th Percentile < 4x cursor’s category and < 40x cursor’s avg • e.g. < 5ms and < 40x avg • Execution time’s 99th Percentile < 5x cursor’s category and < 50x cursor’s avg • e.g. < 6.25ms and < 50x avg
  28. 28. Algorithm’s aggressiveness style - sample Level Meaning > Executions < Elapsed Time (ms) 1 Conservative 25,000 0.25 2 20,000 0.50 3 Moderate 15,000 0.75 4 10,000 1.00 5 Aggressive 5,000 1.25 Lvl > Execs < ET (ms) Avg < ET (ms) 90th Pctl < ET (ms) 95th Pctl < ET (ms) 97th Pctl < ET (ms) 99th Pctl 1 25,000 0.25 0.50 0.75 1.00 1.25 2 20,000 0.50 1.00 1.50 2.00 2.50 3 15,000 0.75 1.50 2.25 3.00 3.75 4 10,000 1.00 2.00 3.00 4.00 5.00 5 5,000 1.25 2.50 3.75 5.00 6.25 Begin Tx Category Implementation
  29. 29. Sample category thresholds implementation Category > Executions Conservative > Executions Moderate > Executions Aggressive < ET (ms) Conservative < ET (ms) Moderate < ET (ms) Aggressive CommitTx 25,000 15,000 5,000 0.5 1.5 2.5 BeginTx 25,000 15,000 5,000 0.25 0.75 1.25 Read 5,000 3,000 1,000 10 30 50 GC 5,000 3,000 1,000 1,000 3,000 5,000 Other 5,000 3,000 1,000 200 600 1,000
  30. 30. Create SQL Plan Baseline (SPB) • Enabled • Accepted • But not “Fixed” • Source most from Cursor Cache, and some from AWR
  31. 31. Log decision • Update SPB “description” • Source SQL_ID • Source plan hash value (PHV) • Date when promoted to “Fixed” or demoted from “Fixed” • Write into log • Created SPB with selection metrics such as execution percentiles • Promoted and demoted SPBs, with criteria used • Rejected candidates and reason • Preserve logs for at least 1 month
  32. 32. FPZ Algorithm (recap) • Pre-select SQL_ID/PHV candidates, mainly from shared pool • If there exists a valid SQL Plan Baseline (SPB) for candidate • Demote SPB if underperforms (disable it) • Promote SPB after proven performance (fix it) • Else (no SPB exists for candidate) • Further screen candidate • Create SPB if candidate is accepted • Log decision
  33. 33. AWR Configuration • EXEC DBMS_SPM.CONFIGURE('plan_retention_weeks', 13); • EXEC DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(topnsql=>300); • ALTER SYSTEM SET "_awr_sql_child_limit" = 2000;
  34. 34. Additional considerations • Set Autopurge to NO for Plans on black-list • Manually (out of scope for automation) • What if there is no “proven consistent performance”? • What if average performance is higher than target threshold? • What if predicates selectivity requires more than one execution plan? • What if SQL produces different plans across databases?
  35. 35. FPZ Algorithm Automation • PL/SQL package • Can be executed from SQL*Plus or OEM calling a PL/SQL library • Executed connecting as CDB$ROOT • Set of configuration constants • How many SPB to create and how many promote? (or report only) • Report rejected candidates and non-promoted SPBs? • Evaluate particular application categories • Number of executions to consider a candidate, or to qualify for a SPB • Time per execution to qualify a candidate for SPM • Factors over average elapsed time for 90th, 95th, 97th and 99th percentiles • Days of AWR history to consider
  36. 36. Dry run results and sample output +------------------------------------------------------------ | | Candidates : 2019 | SPBs Qualified for Creation : 977 | SPBs Qualified for Promotion : 4 | SPBs Created : 0 | SPBs Promoted : 0 | Date and Time (end) : 2017-10-22T14:33:42 | Duration (secs) : 102 | +------------------------------------------------------------
  37. 37. Implementation results
  38. 38. Implementation results
  39. 39. Implementation results
  40. 40. Outlier sample
  41. 41. Outliers • SQL not considered by PL/SQL library • Candidates rejected for valid reasons (performance, executions, age, etc.) • Bug on Algorithm or PL/SQL library? • Algorithm too restrictive? • Short-lived small spikes • Executions burst combined with frequent hard-parses due to CBO statistics gathering • SQL has multiple optimal plans as per Adaptive Cursor Sharing (ACS) • Algorithm implements a subset
  42. 42. Closing remarks • Past performance may not be indicative of future results • Nevertheless: historical plan performance can be used to determine future SQL performance with some degree of confidence • Not every SQL statement gets a SPB • Some queries are still at risk of spikes • Lower rate of executions, performance above thresholds, new SQL, etc. • And not every plan becomes a SPB (think ACS) • Method presented reduces frequency of “plan flips” • Consistent latency is more important than best performance

×