Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
DATABASE RESOURCE
MANAGER
Drill-down the most underestimate Oracle
feature
Me
• Name: Luís Marques
http://lcmarques.com / @drune / lcarapinha@gmail.com
Luís Marques - @drune - http://lcmarques.com
Agenda
What we are going to talk?
Luís Marques - @drune - http://lcmarques.com
About Database Resource
Manager with a lot of questions,
charts, arrows, screenshots and a
Python script
Luís Marques - @d...
Hand Raising
Is there a simple picture that summarize Resource
Manager CPU scheduling?
Luís Marques - @drune - http://lcma...
OS
OS
BeforeDatabaseResourceManager
Luís Marques - @drune - http://lcmarques.com
CPU #1
CPU #2
P#n
P#n
P#n
P#n
P#n
OS
PMON...
OS
OS
AfterDatabaseResourceManager
Luís Marques - @drune - http://lcmarques.com
Processes waiting
for selection
(DBRM inte...
Moreabout DBRM scheduler…
• DBRM Scheduler is not Database Workload Agnostic
• Priority based round robin algorithm
• Fixe...
Hand Raising
Interesting! How do you prove that you have internal
queues and how the processes there go chosen to be
on CP...
DBRM –Scheduling(VKRM)
• If process must yield,VKRM background process will
determine what is the next process to be on OS...
DBRM –Scheduling(VKRM)
• SuspendingVKRM will place all your session eternally waiting for
CPU.
• SQL> ORADEBUG SETOSPID 16...
DBRM –Scheduling(CPU run-queue)
Luís Marques - @drune - http://lcmarques.com
• vmstat data with DBRM disabled:
• OS run-qu...
DBRM –Scheduling(CPU run-queue)
• Oracle maintains an internal queue for DBRM:
• vmstat data with DBRM active
• Increasing...
Hand Raising
Nice theory but…
I have a database with several schemas with different
priorities.
How I handle
Resource Mana...
presman–DBRM monitorscript
• DBRM MonitoringTool written in Python 2.x and cx_Oracle
• Runs onWindows, Linux and OSX
• Usa...
Usecaseschemaconsolidation–Plan #1
Luís Marques - @drune - http://lcmarques.com
C.G L1 L2 L3 UTILIZATION_
LIMIT
SWITCH CRI...
Hand Raising
Hmm..but the sum of all allocation on all levels is way
over 100%?
How I know the the minimum CPU allocated p...
Emphasis-The MinimumCPU formula
Luís Marques - @drune - http://lcmarques.com
• Minimum CPU for the all DBRM managed sessio...
Emphasis-The MinimumCPU formula
Luís Marques - @drune - http://lcmarques.com
Consumer Group mgmt_p1 mgmt_p2 mgmt_p3 Maximu...
Hand Raising
Great stuff! Let’s go test the Resource Manager plan
ok?
Luís Marques - @drune - http://lcmarques.com
Test#1 –UTILIZATION_LIMIT
• ADHOC Consumer group with UTILIZATION_LIMIT = 60%
• CPU burner: burn_cpu_adhoc.sql
• UTILIZATI...
Hand Raising
Hey, hey, so how I measure it easily?
Luís Marques - @drune - http://lcmarques.com
Test#1 –UTILIZATION_LIMIT
• v$rsrcmgrmetric and v$osstat and do some math:
(cpu_consumed_time_sec / (60 * CPU_count)) * 10...
Hand Raising
That is easy!
How do I test my plan CPU allocation ?
Luís Marques - @drune - http://lcmarques.com
Test#2 –OracleCPUConsumption
• Step 0 – Start presman to measure CPU by CG
• $ presman.py –m cpu -o oracle_cpu.csv –c 5
• ...
Test#2 –OracleCPUConsumption
• Step 2 - Fire up 10 sessions in consumer group RISK
• RISK have a lot more sessions and mor...
Test#2 –OracleCPUConsumption
• Step 3 - Fire up 5 Sessions in consumer group RSK_REPORT
• ADHOC querys got canceled to the...
Test#2 –OracleCPUConsumption
• Step 4 - Fire up 3 Sessions in consumer group ADHOC
• Real world test vs Plan Directives CP...
Test#2 –OracleCPUConsumption
• presman historical CSV data file output_cpu.csv
Luís Marques - @drune - http://lcmarques.com
Hand Raising
Clarified!
With so many sessions for a 4 CPU database!You
surely have throttling right?
Luís Marques - @drune...
Test#3 –ThrottlingbyWaitEvent
• Throttling by Resource Manager can be monitored by the wait event
resmgr:cpu quantum (wait...
Test#3 -ThrottlingbyWaitEvent
• SQL> alter system set resource_manager_plan=‘’
• CPU available = 4 x 10.04 x 60 = 2409,6 s...
Test#3 -ThrottlingbyWaitEvent
• alter system set resource_manager_plan=‘DBRM_PLAN’
• CPU available = 4 x 9,03 x 60 = 2167,...
Hand Raising
Good! I’ve read that we can handle parallel execution.
Handling all the parallel servers seems to be hard for...
TheDW forreporting–Plan #2
Consumer
Group
RATIO PARALLE
L_DEGRE
E_LIMIT
SWITCH
_TIME
S_GROUP PARALLEL_
SERVER_LI
MIT
PARAL...
Hand Raising
Hey hey...WAIT! Now you used plan directives with a
thing called RATIO or SHARE! What is that?
Luís Marques -...
Ratio-TheMinimumCPU formula
Luís Marques - @drune - http://lcmarques.com
Minimum % of CPU for
Consumer Group “n”
The value...
Hand Raising
Can you go forward with the plan testing. I’m
interested on parallel details!
Luís Marques - @drune - http://...
Test#1–PARALLEL_DEGREE_LIMIT
without AUTODOP
• parallel_degree_policy= MANUAL
• OTHERS GROUPs with PARALLEL_DEGREE_LIMIT_P...
Test#1–PARALLEL_DEGREE_LIMIT
withoutAUTODOP
• Generation of a PARALLEL plan when execution is serial is more expensive
• L...
Test#1–PARALLEL_DEGREE_LIMITwith
AUTODOP
• Auto DOP is enabled via parallel_degree_policy= AUTO (or
ADAPTIVE in 12c)
• Onl...
Test#2 – PARALLEL_SERVER_LIMIT
• PARALLEL_SERVER_LIMIT directive is percentage of
parameter parallel_servers_target
• Avoi...
Test#2 – PARALLEL_SERVER_LIMIT
• $ burn_me.sql (19 sessions) to LONG_REPORTING
• SQL> alter system set parallel_servers_ta...
Hand Raising
Clear! What about having give more or less priority to
my parallel statements when they are queued?
Luís Marq...
Test#3–PriorityoftheParallelStatement
Queue
Luís Marques - @drune - http://lcmarques.com
Parsed Statement
& Auto DOP is
ca...
Test#3–PriorityoftheParallelStatement
Queue
• 35 sessions for SHORT and LONG Reporting Consumer
Group.
• $ burn_me_all_sam...
Test#3–PriorityoftheParallelStatement
Queue
• Step 2 - Dequeue of parallel statements started
• Step 3 – Dequeuing continu...
Test#3–PriorityoftheParallelStatement
Queue
• SHORT_REPORTING QueueTime: 7719385 milliseconds
• LONG_REPORTING QueueTime: ...
Hand Raising
What if I have some critical reports that need to
bypass the queue because they are critical?
Luís Marques - ...
CriticalParallelStatementQueues
• Oracle 12c introduced parallel_stmt_critical on plan directives
• Allows one value: BYPA...
Q & A
Luís Marques - @drune - http://lcmarques.com
I bet we don’t
have time for it
Wanttoknow more?
• Dump the state of DBRM with:
• SQL> oradebug setmypid
• SQL> oradebug dump DBSCHEDULER 1
• Trace wait e...
Upcoming SlideShare
Loading in …5
×

Drill Down the most underestimate Oracle Feature - Database Resource Manager

1,544 views

Published on

Being a crucial feature on managing database load and with real world practice showing that Database
Resource Manager (DBRM) is not often used, this talk want to change this and demystify this feature by
explaining how it works in detail on different scenarios, the CPU math behind it, how to measure it in
real-time using Python and SQL and exploring more complex features to understand its behaviour.
Special attention will be made to understand its internals whenever possible.

Published in: Technology
  • Be the first to comment

Drill Down the most underestimate Oracle Feature - Database Resource Manager

  1. 1. DATABASE RESOURCE MANAGER Drill-down the most underestimate Oracle feature
  2. 2. Me • Name: Luís Marques http://lcmarques.com / @drune / lcarapinha@gmail.com Luís Marques - @drune - http://lcmarques.com
  3. 3. Agenda What we are going to talk? Luís Marques - @drune - http://lcmarques.com
  4. 4. About Database Resource Manager with a lot of questions, charts, arrows, screenshots and a Python script Luís Marques - @drune - http://lcmarques.com
  5. 5. Hand Raising Is there a simple picture that summarize Resource Manager CPU scheduling? Luís Marques - @drune - http://lcmarques.com
  6. 6. OS OS BeforeDatabaseResourceManager Luís Marques - @drune - http://lcmarques.com CPU #1 CPU #2 P#n P#n P#n P#n P#n OS PMON OS OS OS P#n P#n OS run-queue • quantum defined by OS • Priority can be changed by OS • All Oracle user sessions have the same priority to be selected for CPU LGWR SMON DBWR
  7. 7. OS OS AfterDatabaseResourceManager Luís Marques - @drune - http://lcmarques.com Processes waiting for selection (DBRM internal queue) CPU #1 CPU #2 S#n S#n S#n S#n S#n S#n DBRM internal queue (priority aware according DBRM plan) OS PM ON OS OS OS S#n S#n OS run-queue OS scheduler will decide between the processes in run- queue LGW R
  8. 8. Moreabout DBRM scheduler… • DBRM Scheduler is not Database Workload Agnostic • Priority based round robin algorithm • Fixed quantum time slice of 100ms given to each process (_dbrm_quantum) • More intelligent scheduling: • Aware of Oracle internal structures (eg: mutex, latching) • Has code to avoid problems like priority inversion. • No CPU starvation from critical background processes • 2 Background Processes: VKRM and DBRM Luís Marques - @drune - http://lcmarques.com
  9. 9. Hand Raising Interesting! How do you prove that you have internal queues and how the processes there go chosen to be on CPU? Luís Marques - @drune - http://lcmarques.com
  10. 10. DBRM –Scheduling(VKRM) • If process must yield,VKRM background process will determine what is the next process to be on OS runqueue: • perf Linux profiler output: Luís Marques - @drune - http://lcmarques.com kgskrunnext - function that is responsible for next-process on OS runqueue?
  11. 11. DBRM –Scheduling(VKRM) • SuspendingVKRM will place all your session eternally waiting for CPU. • SQL> ORADEBUG SETOSPID 16568 Oracle pid: 10, Unix process pid: 16568, image: oracle@baco (VKRM) • SQL> ORADEBUG SUSPEND Luís Marques - @drune - http://lcmarques.com ORADEBUG SUSPEND ORADEBUG RESUME 100% resmgr: cpu quantum
  12. 12. DBRM –Scheduling(CPU run-queue) Luís Marques - @drune - http://lcmarques.com • vmstat data with DBRM disabled: • OS run-queue does increase while increasing session number: 41 sessions at end for 2 CPUs As soon as sessions increase, OS run queue increases
  13. 13. DBRM –Scheduling(CPU run-queue) • Oracle maintains an internal queue for DBRM: • vmstat data with DBRM active • Increasing sessions number gradually Luís Marques - @drune - http://lcmarques.com OS run queue doesn’t increase even with 41 sessions and 2 CPUs
  14. 14. Hand Raising Nice theory but… I have a database with several schemas with different priorities. How I handle Resource Management? Luís Marques - @drune - http://lcmarques.com
  15. 15. presman–DBRM monitorscript • DBRM MonitoringTool written in Python 2.x and cx_Oracle • Runs onWindows, Linux and OSX • Usage ./presman.py -m measure -o filename –c column_id -p • Available measures: CPU, SESSION_IO, PARALLEL, EMPHASIS • Download: http://lcmarques.com/presman-dbrm-monitor/ • Available on github: https://github.com/lcmarques/presman Luís Marques - @drune - http://lcmarques.com
  16. 16. Usecaseschemaconsolidation–Plan #1 Luís Marques - @drune - http://lcmarques.com C.G L1 L2 L3 UTILIZATION_ LIMIT SWITCH CRITERIA SWITCH Consumer Group RISK 65 % 12o Logical I/O LOG_ONLY RSK_REP ORT 50% ADHOC 40% 60% 120 seconds CANCEL_SQL OTHER_G ROUPS 100%
  17. 17. Hand Raising Hmm..but the sum of all allocation on all levels is way over 100%? How I know the the minimum CPU allocated per consumer group? Luís Marques - @drune - http://lcmarques.com
  18. 18. Emphasis-The MinimumCPU formula Luís Marques - @drune - http://lcmarques.com • Minimum CPU for the all DBRM managed sessions, not host minimum CPU allocation • Minimum CPU : Minimum % of CPU for Consumer Group “n” The value specified in plan directive mgmt_pn Product of a sequence k = mgmt_p1 n = mgmt_pn The sum of mgmt_p (n-1) level
  19. 19. Emphasis-The MinimumCPU formula Luís Marques - @drune - http://lcmarques.com Consumer Group mgmt_p1 mgmt_p2 mgmt_p3 Maximum CPU RISK 100% RSK_REPORT 100% ADHOC 60% OTHER_GROUPS 100% 65% 17,5% 14% 3,5%
  20. 20. Hand Raising Great stuff! Let’s go test the Resource Manager plan ok? Luís Marques - @drune - http://lcmarques.com
  21. 21. Test#1 –UTILIZATION_LIMIT • ADHOC Consumer group with UTILIZATION_LIMIT = 60% • CPU burner: burn_cpu_adhoc.sql • UTILIZATION_LIMIT is not a host CPU limit! • UTILIZATION_LIMIT is for Oracle user sessions managed by DBRM Luís Marques - @drune - http://lcmarques.com Us ~66% Sys ~7%
  22. 22. Hand Raising Hey, hey, so how I measure it easily? Luís Marques - @drune - http://lcmarques.com
  23. 23. Test#1 –UTILIZATION_LIMIT • v$rsrcmgrmetric and v$osstat and do some math: (cpu_consumed_time_sec / (60 * CPU_count)) * 100 • $ presman.py –m cpu -o oracle_cpu.csv –c 7 -p Luís Marques - @drune - http://lcmarques.com Oracle CPU in % by Consumer Group
  24. 24. Hand Raising That is easy! How do I test my plan CPU allocation ? Luís Marques - @drune - http://lcmarques.com
  25. 25. Test#2 –OracleCPUConsumption • Step 0 – Start presman to measure CPU by CG • $ presman.py –m cpu -o oracle_cpu.csv –c 5 • Step 1 - Fire up 3 sessions ADHOC consumer group • Almost 100% CPU for all consumer groups is used onADHOC Luís Marques - @drune - http://lcmarques.com
  26. 26. Test#2 –OracleCPUConsumption • Step 2 - Fire up 10 sessions in consumer group RISK • RISK have a lot more sessions and more priority • No UTILIZATION_LIMIT directive on RISK consumer group • ADHOC consumer groupCPU is down to almost 20% of all consumer group CPU activity Luís Marques - @drune - http://lcmarques.com
  27. 27. Test#2 –OracleCPUConsumption • Step 3 - Fire up 5 Sessions in consumer group RSK_REPORT • ADHOC querys got canceled to the directive CANCEL_SQL • RISK and RISK_REPORT are consuming almost every CPU cycle. Luís Marques - @drune - http://lcmarques.com
  28. 28. Test#2 –OracleCPUConsumption • Step 4 - Fire up 3 Sessions in consumer group ADHOC • Real world test vs Plan Directives CPU allocation Luís Marques - @drune - http://lcmarques.com Consumer Group Minimum CPU Test Minimum CPU Sessions RISK 65% 66,74% 10 RSK_REPORT 17,5% 18,23% 5 ADHOC 14% 14,81% 3 + 3 OTHERS_GROUP 3,5% 0,22% No sessions
  29. 29. Test#2 –OracleCPUConsumption • presman historical CSV data file output_cpu.csv Luís Marques - @drune - http://lcmarques.com
  30. 30. Hand Raising Clarified! With so many sessions for a 4 CPU database!You surely have throttling right? Luís Marques - @drune - http://lcmarques.com My hand hurts…
  31. 31. Test#3 –ThrottlingbyWaitEvent • Throttling by Resource Manager can be monitored by the wait event resmgr:cpu quantum (wait class Scheduler) • Without Resource Manager, the time spent in “resmgr:cpu quantum” will be spent instead as waits on the operating system run queue. • AWR report indication of high waits on the run queue is from the server load numbers (11g) • 12c AWR has more information on CPU Wait • resmgr: cpu quantum doesn’t necessarily means you have a overloaded CPU (eg: UTILIZATION_LIMIT directive) Luís Marques - @drune - http://lcmarques.com
  32. 32. Test#3 -ThrottlingbyWaitEvent • SQL> alter system set resource_manager_plan=‘’ • CPU available = 4 x 10.04 x 60 = 2409,6 sec • Consumed CPU = 2053,9 (85%) • % of CPUWait = 99.79% - 42.7% = 57,09 % of DBTime spent of OS run queue Luís Marques - @drune - http://lcmarques.com
  33. 33. Test#3 -ThrottlingbyWaitEvent • alter system set resource_manager_plan=‘DBRM_PLAN’ • CPU available = 4 x 9,03 x 60 = 2167,2 • Consumed CPU = 1820,9 (84%) • 63% of DBTime is spent on waiting in Resource Manager internal queue • % of CPUWait = 36,64% - 28,1 % = Only 8,54 % of DBTime spent of OS run queue Luís Marques - @drune - http://lcmarques.com
  34. 34. Hand Raising Good! I’ve read that we can handle parallel execution. Handling all the parallel servers seems to be hard for me! Luís Marques - @drune - http://lcmarques.com
  35. 35. TheDW forreporting–Plan #2 Consumer Group RATIO PARALLE L_DEGRE E_LIMIT SWITCH _TIME S_GROUP PARALLEL_ SERVER_LI MIT PARALLEL _QUEUE_ TIMEOUT OTHERS_GRO UP 10 0 120 sec SHORT_RE PORTING SHORT_REPO RTING 5 900 sec LONG_RE PORTING 50% LONG_REPOR TING 1 50% 3600 sec Luís Marques - @drune - http://lcmarques.com • RATIO was used on create_plan() • Priority statements on OTHERS_GROUPS have to execute on serial • To limit the parallel servers used by a consumer group, use the parallel_server_limit directive
  36. 36. Hand Raising Hey hey...WAIT! Now you used plan directives with a thing called RATIO or SHARE! What is that? Luís Marques - @drune - http://lcmarques.com
  37. 37. Ratio-TheMinimumCPU formula Luís Marques - @drune - http://lcmarques.com Minimum % of CPU for Consumer Group “n” The value specified in plan directive mgmt_pnSum of all ratios Consumer Group Mgmt_p1 Ratio Ratio as Emphasis OTHERS_GROUP 10 10 / 16 = 62,5 % SHORT_REPORTING 5 5 / 16 = 31,25 % LONG_REPORTING 1 1 / 16 = 6,25%
  38. 38. Hand Raising Can you go forward with the plan testing. I’m interested on parallel details! Luís Marques - @drune - http://lcmarques.com
  39. 39. Test#1–PARALLEL_DEGREE_LIMIT without AUTODOP • parallel_degree_policy= MANUAL • OTHERS GROUPs with PARALLEL_DEGREE_LIMIT_P1 = 0 (DOP=0) • $ burn_me.sh (1 session) • $ presman.py –m parallel Luís Marques - @drune - http://lcmarques.com 1 Parallel Statement No Parallel Servers
  40. 40. Test#1–PARALLEL_DEGREE_LIMIT withoutAUTODOP • Generation of a PARALLEL plan when execution is serial is more expensive • Large difference between DOP assumed at optimization time (hard parse time) and actual DOP at execution time might lead to not optimal execution plans Luís Marques - @drune - http://lcmarques.com
  41. 41. Test#1–PARALLEL_DEGREE_LIMITwith AUTODOP • Auto DOP is enabled via parallel_degree_policy= AUTO (or ADAPTIVE in 12c) • Only new Auto DOP codepath negotiates with DBRM • alter session set "_px_trace"="high",all; • $ burn_me.sh (1 session) Luís Marques - @drune - http://lcmarques.com
  42. 42. Test#2 – PARALLEL_SERVER_LIMIT • PARALLEL_SERVER_LIMIT directive is percentage of parameter parallel_servers_target • Avoid a low priority user and consumer group to get all parallel servers • When percentage of parallel servers is reached for Consumer Group  Statement Queued • Auto DOP is enabled to enable Parallel Statement Queueing Luís Marques - @drune - http://lcmarques.com Consumer Group PARALLEL_SERVERS_TAR GET PARALLEL_SERVER_LIMIT LONG_REPORTING 64 50% SHORT_REPORTING 64 50%
  43. 43. Test#2 – PARALLEL_SERVER_LIMIT • $ burn_me.sql (19 sessions) to LONG_REPORTING • SQL> alter system set parallel_servers_target = 64 • $ presman.py –m parallel Luís Marques - @drune - http://lcmarques.com 16 statements running 3 statements queued 32 Parallel Servers = 50% of parallel_servers_target
  44. 44. Hand Raising Clear! What about having give more or less priority to my parallel statements when they are queued? Luís Marques - @drune - http://lcmarques.com
  45. 45. Test#3–PriorityoftheParallelStatement Queue Luís Marques - @drune - http://lcmarques.com Parsed Statement & Auto DOP is calculated SQL stat SQL stat SQL stat Statement Executes in Parallel SQL stat SQL stat SQL stat SQL stat FIFO Statements Queue per Consumer Group – not enough parallel servers or limit reached Enough parallel servers – PARALLEL_SERVER_LIMIT not reached Statement Executes in Parallel Dequeuing priority based RATIO / SHARES or EMPHASIS values on the Consumer Group SQL stat SQL stat
  46. 46. Test#3–PriorityoftheParallelStatement Queue • 35 sessions for SHORT and LONG Reporting Consumer Group. • $ burn_me_all_same_time.sh • $ presman.py –m parallel –o queue_time.csv –c 4 • Step 1 - 16 Statements running and 19 queued for each Consumer Group Luís Marques - @drune - http://lcmarques.com
  47. 47. Test#3–PriorityoftheParallelStatement Queue • Step 2 - Dequeue of parallel statements started • Step 3 – Dequeuing continues as soon as some statements finish • Step 4 - Almost every statement done. No queued statements Luís Marques - @drune - http://lcmarques.com
  48. 48. Test#3–PriorityoftheParallelStatement Queue • SHORT_REPORTING QueueTime: 7719385 milliseconds • LONG_REPORTING QueueTime: 11375129 milliseconds 67,8% less queue time for SHORT_REPORTING • SHORT_REPORTING ratio is 5 for 1 in LONG_REPORTING • SHORT_REPORTING has 5 times more probability to get one statement dequeded than LONG_REPORTING. Luís Marques - @drune - http://lcmarques.com
  49. 49. Hand Raising What if I have some critical reports that need to bypass the queue because they are critical? Luís Marques - @drune - http://lcmarques.com
  50. 50. CriticalParallelStatementQueues • Oracle 12c introduced parallel_stmt_critical on plan directives • Allows one value: BYPASS_QUEUE • Sessions will start immediately and not wait in the queue. • parallel_max_servers init parameter is the hard threshold and critical statements can run with lower number of PX servers dbms_resource_manager.create_plan_directive( plan => 'REPORTS_PLAN', group_or_subplan => 'CRITICAL_REPORT', comment => 'CRITICAL Reporting Querys', parallel_stmt_critical => 'BYPASS_QUEUE'); Luís Marques - @drune - http://lcmarques.com
  51. 51. Q & A Luís Marques - @drune - http://lcmarques.com I bet we don’t have time for it
  52. 52. Wanttoknow more? • Dump the state of DBRM with: • SQL> oradebug setmypid • SQL> oradebug dump DBSCHEDULER 1 • Trace wait events with 12c interface: • SQL> alter session set events 'wait_event["resmgr:cpu quantum"] trace("%sn", shortstack())'; • SQL> exec DBMS_MONITOR.SESSION_TRACE_ENABLE(waits => true, binds => false, plan_stat => 'NEVER'); Luís Marques - @drune - http://lcmarques.com

×