DB12c: All You Need to Know About the Resource Manager

MARIS ELSINS
Harmony 16
DB12c: All You Need to Know
About the Resource Manager

Maris Elsins
Lead DatabaseConsultant
At Pythian since 2011
Located in Riga, Latvia
Oracle [Apps] DBA since 2005
Speaker at conferences since 2007
@MarisElsins elsins@pythian.com
http://bit.ly/getMOSPatch

ABOUT PYTHIAN
11,400
Pythian currently manages
more than 11,400 systems.
400+
Pythian currently employs
more than 400 people in 200
cities in 35 countries
1997
Pythian was founded in 1997
Global Leader In IT Transformation And Operational Excellence
Unparalleled Expertise
• Top 5% in databases,applications,infrastructure,Big Data, Cloud,Data Science,
and DevOps
Unmatched Certifications
• 9 Oracle ACEs, 4 Oracle ACE Directors, 1 Oracle ACE Associate
• 6 Microsoft MVPs, 1 Microsoft Certified Master
• 5 Google Platform Qualified Developers
• 1 Cloudera Champion ofBig Data
• 1 Mongo DB Certified DBAAssociate Level
• 1 DataStax Certified Partner, 1 MVP
Broad Technical Experience
• Oracle, Microsoft, MySQL, Oracle EBS, Hadoop,Cassandra,MongoDB,
virtualization,configuration management,monitoring,trending,and more.

AGENDA
• Features of the Resource Manager
• The new 12c-stuff
• Consolidations using Oracle Multitenant
• Overhead of the RM

Introduction of the Resource Manager

THE PROBLEM
• Problems start when there’s not enough CPU for
everyone
• CPU starvation can be hard to recover from
(the snowball effect)
• Troubleshooting an ongoing problem is difficult to do
• OS doesn’t care enough about DB-specific resources
– Undo
– Locks
– Parallelism

PROBLEM SCENARIOS
• Running reports causes too much load on the OLTP system.
• One of the sessions allocate all parallel query slaves therefore other sessions don’t get any
• Application support team runs heavy queries to analyze the data leaving less resources for
online transactions
• Wide search criteria cause “hangs” in the search form
• 3 of 8 CPU cores are idle, my query runs without parallel execution,
I could use the idle CPUs to provide results faster
• Users don’t log out and leave idle sessions
• My batch process requires DOP=8 to complete in time, but it’s downgraded to smaller DOP
if enough parallel slaves are not available
• My query is very important. It’s IO requests have to be prioritized!
• Sessions with incomplete transactions have locked some rows and other sessions have
stuck.

THE SOLUTION
• Resource Manager
– Included in Oracle EE license
– Prioritization of sessions based on defined rules
– Guaranteed amount resources for each type of sessions (consumer group)
– (optional) upper bound of resources for each type of sessions
• Prioritization is achieved by changing the process states to running/sleeping
– DBRM (resource plan management) / VKRM (CPU scheduling)
– Utilizes Semaphores (wake up sleeping processes)
– CPU quantum (_dbrm_quantum)
• Resource manager does not solve the «lack of CPU resources» problem, it just
controls the execution queue
• Resource manager uses some resources too, the last part of the presentation
will estimate the overhead

BASIC FEATURES
9.2 10.2 11.1 11.2 12.1
CPU resource allocation J J J J J
Limitof the degree of parallelism J J J J J
active session pool J J J J J
Automated change of consumer group if session has used or is
estimated to use the defined amountof resources
CPU,
Est CPU
CPU,
Est CPU
CPU,
Est CPU,
IO_MB,
IO_REQ
CPU,
Est CPU,
IO_MB,
IO_REQ
CPU, IO_MB,
IO_REQ,
Est CPU, LIO,
Ela, Est Ela
Limitof estimated execution time J J J J J
Limitsize of undo used by uncommitted sessions J J J J J
Termination of idle sessions J J J J
Termination of idle blocking sessions J J J J
L0 70% CPU _ORACLE_BACKGROUND_GROUP_hidden consumer
group for background processes J J J at 90%
Instance caging /CPU_COUNT+ resource plan/ J J
Max CPU Utilization limit J J
Parallel StatementQueue J J
LOG_ONLY “switch group” for real-time SQL monitoring J
Simplified automated consumer group switching J

12.1 DOCUMENTATION
“Resource Manager 12.1”
https://docs.oracle.com/database/121/ADMIN/dbrm.htm

THE BASIC CONCEPTS
• Consumer group
– Set of sessions having similar requirements
for server resources
– Resources are allocated to the consumer
group, not individual sessions
– DBA_RSRC_CONSUME_GROUPS
• Directives
– Rules that define resource allocation to the
consumer group
– DBA_RSRC_PLAN_DIRECTIVES
• Resource plan
– Set of directives defining the distribution of
resources among consumer groups
– DBA_RSRC_PLANS

RESMGR:CPU QUANTUM
WHY IS MY SESSION NOT RUNNING?
SQL> select event, count(*) from v$session group by event order by 2 desc;
EVENT COUNT(*)
---------------------------------------------------------------- ----------
resmgr:cpu quantum 25
rdbms ipc message 23
Space Manager: slave idle wait 16
SQL*Net message from client 9
EMON slave idle wait 5
DIAG idle wait 2
LGWR worker group idle 2
GCR sleep 2
Streams AQ: waiting for time management or cleanup tasks 1
VKTM Logical Idle Wait 1
AQPC idle 1
Streams AQ: qmn coordinator idle wait 1
VKRM Idle 1
PING 1
...
23 rows selected.

RESMGR:CPU QUANTUM
SQL> select event, status, count(*) from v$session
where event='resmgr:cpu quantum'
group by event, status order by 1,2;
EVENT STATUS COUNT(*)
------------------ -------- ----------
resmgr:cpu quantum ACTIVE 25

RESMGR:CPU QUANTUM
SQL> select event, status, state, count(*)
from v$session where event='resmgr:cpu quantum'
group by event, status, state order by 1,2,3;
EVENT STATUS STATE COUNT(*)
------------------ -------- ------------------- ----------
resmgr:cpu quantum ACTIVE WAITED KNOWN TIME 7
resmgr:cpu quantum ACTIVE WAITED SHORT TIME 16
resmgr:cpu quantum ACTIVE WAITING 2

RESMGR:CPU QUANTUM
• EVENT values are often misinterpreted:
– V$SESSION
– V$SESSION_WAIT
• Common mistake is to forget about V$SESSION.STATE
• If STATE = 'WAITING’, only then the session is waiting
– EVENT shows what the session is waiting for
– STATUS can be ACTIVE or INACTIVE
• If STATE = 'WAITED % TIME’ ..
– and STATUS = 'ACTIVE', the session is ON CPU
– and STATUS != 'ACTIVE', the session is not running
THIS IS TRUE FOR ALL WAITEVENTS

THE NEW 12C-STUFF
• Improvements to non-CDB RM
– Mostly to improve automated consumer group switching
• RM in 12c CDB
– CDB resource plans
– PDB resource plans

AUTOMATED CONSUMER GROUP SWITCHING
12C: MORE OPTIONS!
• Logical IO
• Elapsed time
• Estimated elapsed time
• Estimated CPU time
– The new algorithm replaces cost-based estimation
• Real-time SQL monitoring
– LOG_ONLY

ESTIMATED ELAPSED/CPU TIME – RECURSIVE STATEMENT
SELECT executions,
end_of_fetch_count,
elapsed_time / px_servers elapsed_time,
cpu_time / px_servers cpu_time,
buffer_gets / executions buffer_gets
FROM
(SELECT SUM(executions) AS executions,
sum (
CASE
WHEN px_servers_executions > 0
THEN px_servers_executions
ELSE executions
END) AS px_servers,
SUM(end_of_fetch_count) AS end_of_fetch_count,
SUM(elapsed_time) AS elapsed_time,
SUM(cpu_time) AS cpu_time,
SUM(buffer_gets) AS buffer_gets
FROM gv$sql
WHERE executions > 0
AND sql_id = :1
AND parsing_schema_name = :2
)

ESTIMATED ELAPSED/CPU TIME – RECURSIVE STATEMENT
SELECT executions,
end_of_fetch_count,
elapsed_time / px_servers elapsed_time,
cpu_time / px_servers cpu_time,
buffer_gets / executions buffer_gets
FROM
(SELECT SUM(executions_delta) AS EXECUTIONS,
SUM(
CASE WHEN px_servers_execs_delta > 0 THEN px_servers_execs_delta ELSE
executions_delta
END) AS px_servers,
SUM(end_of_fetch_count_delta) AS end_of_fetch_count,
SUM(elapsed_time_delta) AS ELAPSED_TIME,
SUM(cpu_time_delta) AS CPU_TIME,
SUM(buffer_gets_delta) AS BUFFER_GETS
FROM DBA_HIST_SQLSTAT s,
V$DATABASE d,
DBA_HIST_SNAPSHOT sn
WHERE s.dbid = d.dbid
AND bitand(NVL(s.flag, 0), 1) = 0
AND sn.end_interval_time > (SELECT SYS imestamp at TIME ZONE dbtimezone FROM
dual) - 7
AND s.sql_id = :1
AND s.snap_id = sn.snap_id
AND s.instance_number = sn.instance_number
AND s.dbid = sn.dbid
AND parsing_schema_name = :2)

REAL-TIME SQL MONITORING IMPROVEMENTS
LOG_ONLY – RESERVED CONSUMER GROUP NAME
• Simplifies analysis of consumer group switching? – Not Much L
• V$SQL_MONITOR
– RM_LAST_ACTION (i.e. LOG_ONLY)
– RM_LAST_ACTION_REASON (i.e. SWITCH_ELAPSED_TIME)
– RM_LAST_ACTION_TIME (i.e. 2015.11.26)
– RM_CONSUMER_GROUP (i.e. BATCH_GROUP)
• RM_* columns are not represented in reports, just in
V$SQL_MONITOR
• Historical SQL Monitor Reports – don’t include the RM_* info either
– DBA_HIST_REPORTS / DBA_HIST_REPORTS_DETAILS
– http://mauro-pagano.com/2015/05/04/historical-sql-monitor-reports-in-12c
– But at least you have the reports!

REAL-TIME SQL MONITORING IMPROVEMENTS
LOG_ONLY – RESERVED CONSUMER GROUP NAME

CONSUMER GROUP SWITCHING
SIMPLIFIED MANAGEMENT OF PRIVILEGES
• In pre-12c any kind of switching required explicit privilege
– DBMS_RESOURCE_MANAGER_PRIVS.GRANT_SWITCH_CONSUMER_GROUP
• 12.1 privileges included for:
– Consumergroup mappings
– Condition based on SWITCH_GROUP
• What it means to DBAs?
– Removes redundantwork
– Simplicity
– More flexibility as explicit grants can be avoided

Consolidation using Oracle Multitenant

CDB RESOURCE PLAN
• CDB resource plan
– Defines how resources are distributed between PDBs
– Shares – Minimum portion of resources allocated to the PDB
– Additional Limits
• Utilization_limit
• Parallel_server_limit (%)
• CDB Plan Directives (in DEFAULT_CDB_PLAN)
– ORA$DEFAULT_PDB_DIRECTIVE – default
• Shares=1, utilization_limit=100, parallel_server_limit=100
– ORA$AUTOTASK – for autotasks in root container
• Shares=1, utilization_limit=90, parallel_server_limit=100
• User-defined directives for exceptionalPDBs
• *_limit parameters allow setting up “PDB caging”

PDB RESOURCE PLAN
• Allows to use the resources proportionally to the allocated
shares
• Works just like a resource plan for non-CDB
• Few restrictions
– A PDB resource plan can't have sub-plans.
– A PDB resource plan can have a maximum of eight consumer
groups.
– A PDB resource plan cannothave a multi-level scheduling policy.
• So we need to take action to re-implement the resource plans
when we switch from non-CDB to the CDB?
– Not always! It happens automatically,but how?

CONVERTING NON-CDB PLANS TO PDB PLANS
MULTI-LEVEL SCHEDULING POLICIES ARE NOT ALLOWED
• Automatically when the non-CDB is converted into PDB
– $ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql
– The original plan and plan directives are saved with STATUS=LEGACY
– A new plan is added with the same name and STATUS={null}
• Multilevel plan is converted into a single-level plan
• Algorithm is not documented, but appears to be simple enough
– Adjust allocated CPU% on each level
• Reduce each level to 75% proportionally
• Leave it as is if it’s already lower than 75%
– The “free portion” is passed to the lower level and split per calculated
percentages, the remaining portion is passed down
– The last level get’s all remaining resources

CONVERTING NON-CDB TO PDB
EXAMPLE 1

EXAMPLE 2

EXAMPLE 3

• RM requires resources
– I’ve heard rumors: 1-5-10 % of CPU?
• Testing needed!
NOTHING IS FOR FREE

MEASURING THE OVERHEAD
HOW AND WHAT DO WE TEST?
• HW – ODA V1 (12 Cores With HT => 24 Logical CPUs)
– Two 6-core 3.06 GHz Intel Xeon® X5675 processors
• DB versions
– 12.1.0.2 non-CDB
– 12.1.0.2 CDB (tests executed in 1 PDB)
– 11.2.0.4
• Checking:
– TEST1: Max Performance without RM
– TEST2: Max Performance with RM
– TEST3: Is the guaranteed resource allocation working?
– TEST4: Accuracy of the resource allocation
– TEST5: Overhead
33

MEASURING THE OVERHEAD
HOW AND WHAT DO WE TEST?
• SLOB in LIO testing mode
– 60 schemas, each 10000 blocks (80MB)
– Read-only (UPDATE_PCT=0)
– No think time (THINK_TM_FREQUENCY=0)
• A Few custom scripts
– Warm_cache.sql
– Wrapper to initiate SLOB (total of 441 runs)
– Modified runit.sh
• Switches consumer groups
• Triggers the status check
• Kills sessions
– Status check
– Response time of a non-DB script
34

TESTING SCRIPTS
STATUS.SQL
...
SELECT CURRENT_TIMESTAMP ts ,
NVL(RESOURCE_CONSUMER_GROUP,'{null}'),
COUNT(*) sessions,
SUM(ss.value) WORK_DONE
FROM v$session s,
v$sesstat ss
WHERE s.username LIKE 'USER%’
AND s.sid =ss.sid
AND ss.statistic#=(SELECT statistic# FROM v$statname WHERE name='consistent gets')
GROUP BY CURRENT_TIMESTAMP,
NVL(RESOURCE_CONSUMER_GROUP,'{null}')
ORDER BY 2
...

TESTING SCRIPTS
STATUS.SQL
DECLARE
TYPE t_progr IS TABLE OF NUMBER INDEX BY VARCHAR2(64);
pre_work t_progr;
pre_sess t_progr;
post_work t_progr;
post_sess t_progr;
pre_ts timestamp;
post_ts timestamp;
cursor c is select current_timestamp ts , nvl(RESOURCE_CONSUMER_GROUP,'{null}')||' / '||action RESOURCE_CONSUMER_GROUP, count(*) sessions,
sum(ss.value) WORK_DONE from v$session s, v$sesstat ss where s.username like 'USER%' and s.sid=ss.sid and ss.statistic#=(select statistic# from
v$statname where name='consistent gets') group by current_timestamp, nvl(RESOURCE_CONSUMER_GROUP,'{null}')||' / '||action order by 2;
c1 c%rowtype;
c2 c%rowtype;
l_key varchar2(100);
work_done number;
begin
for c1 in c loop
pre_ts:=c1.ts;
pre_work(c1.RESOURCE_CONSUMER_GROUP):=c1.WORK_DONE;
pre_sess(c1.RESOURCE_CONSUMER_GROUP):=c1.sessions;
end loop;
dbms_lock.sleep(30);
for c2 in c loop
post_ts:=c2.ts;
post_work(c2.RESOURCE_CONSUMER_GROUP):=c2.WORK_DONE;
post_sess(c2.RESOURCE_CONSUMER_GROUP):=c2.sessions;
end loop;
l_key := pre_work.first;
LOOP
EXIT WHEN l_key IS NULL;
work_done:=round((post_work(l_key)-pre_work(l_key))/(extract(minute from (post_ts-pre_ts))*60+extract(second from (post_ts-pre_ts))),3);
dbms_output.put_line(rpad(l_key,60,' ')||': '||rpad(post_work(l_key),16,' ')||' - '||rpad(pre_work(l_key),16,' ')||' = '||rpad(post_work(l_key)-
pre_work(l_key)||' / '||(extract(minute from (post_ts-pre_ts))*60+extract(second from (post_ts-pre_ts)))||'s',40,' ')||' ==> '||work_done||' w/s
(with '||post_sess(l_key)||' sessions) ' || round((work_done/post_sess(l_key)),3)||' w/s per session');
l_key := pre_work.next(l_key);
END LOOP;
end;
/
L2_GROUP1: 15582619 -681053 = 14901566 /180.46772s ==> 82571.919 w/s (with 12 sessions) 6880.993 w/s per session
L2_GROUP2: 129517874-6005013 = 123512861/180.46772s ==> 684404.175 w/s (with 12 sessions) 57033.681 w/s per session

TESTING SCRIPTS
! RESPONSE.SH
$ cat ../response.sh
for i in {1..5000}
do
echo "sqrt($i)" | bc > /dev/null
done
$ time response.sh
real 0m4.886s
user 0m0.291s
sys 0m1.096s

TEST1
NO RESOURCE MANAGER
• Init parameters:
– CPU_COUNT=24
– RESOURCE_MANAGER_PLAN='FORCE:’
• CDB
– RESOURCE_MANAGER_PLAN='FORCE:’ was set in all PDBs
and ROOT.
– ! Having a RM plan enabled in one PDB caused the whole CDB
to be managed by the Resource Manager (even if no CDB plan
was set)
38

TEST1
NO RESOURCE MANAGER – TOTAL WORK
39
§ Almost linear scaling till 12 cores, HT adds ~25-30% per core.
§ Performance: 11gR2 > 12c CDB > 12c non-CDB

TEST2
NO RESOURCE MANAGER – BURN_CPU.SQL V2
40
§ OS script response is:
§ 4 – 7 s for 1-23 sessions
§ ~70 – 90 s for 24-48 sessions

OFFTOPIC – TEST1 (PURE PL/SQL TEST)
NO RESOURCE MANAGER – BURN_CPU.SQL V2
41
§ PL/SQL on 11gR2 performs worse compared to 12c J

TEST2
SIMPLE RESOURCE PLAN
42
• The resource plan
– SYS_GROUP = 1% at L1
– OTHER_GROUP = 1% at L1
– L2_GROUP1 = 1% at L1
• All sessions will be in L2_GROUP1

TEST2
43
§ Very similar results to TEST1 (no RM)

TEST2
44
What is that
spike?
§ Even a very simple RM plan throttles sessions instead of letting them saturate the server
§ Spike at exactly 24 active sessions is caused by RM is not yet throttling sessions and all
Logical CPUs are used

TEST3
80%-15% RESOURCE PLAN
– SYS_GROUP = 5%
– OTHER_GROUP = 0%
– L2_GROUP1 = 80%
– L2_GROUP1 = 15%
• 24 sessions will be started in L2_GROUP1
• 0-36 sessions will be started in L2_GROUP2
• The Goal
– Check if both consumer groups get the allocated resources
45

TEST3
80%-15% RESOURCE PLAN – 12C CDB
46

TEST3
80%-15% RESOURCE PLAN – 12C NON-CDB
47

TEST3
80%-15% RESOURCE PLAN – 11GR2
48

TEST4
ALLOCATION ACCURACY
– SYS_GROUP = 1% at L1
– L2_GROUP1 = 0% at L1
– L2_GROUP2 = 10% at L1
– L2_GROUP3 = 20% at L1
– L2_GROUP4 = 30% at L1
– L2_GROUP5 = 39% at L1
– OTHER_GROUP = 0% at L1
• 12 sessions will be started in each L2_GROUP% group
• The Goal
– Check if all percentages are met
– 3 * 3 minutes, AVG
49

TEST4
ALLOCATION ACCURACY – 12C CDB
50

TEST4
ALLOCATION ACCURACY – 12C NON-CDB
51

TEST4
ALLOCATION ACCURACY – 11GR2
52

RM OVERHEAD
COMPARING AVG (W/S) FOR 25-48 SESSIONS TEST1/TEST2
53

RM OVERHEAD - COMPARING PERFORMANCE
54

FINDINGS
• The basic overhead of RM is negligible ( <2% )
– Outlier cases are possible (but rare)
• Session holding a “latch” is sent off-CPU
• Session holding a lock is sent off-CPU
– ... only if out of resources already
• OS Responsiveness is useful – this alone is good enough reason to use RM
– For Troubleshooting
– For keeping RAC alive
• Don’t create “fancy” RM plans – It does not guarantee exact resource
distribution
• Careful with RM on CDB/PDBs!
– Enabling it on 1 PDB enables it for the whole CDB
– Remember the scheduler windows: (RMP='FORCE:')

Time for Questions!
elsins@pythian.com
@MarisElsins
Lead Database Consultant,
Pythian
+44 (0) 20 3411 8378 ext 337
Maris Elsins
Pythian.com
@pythian

DB12c: All You Need to Know About the Resource Manager

More Related Content

What's hot

Similar to DB12c: All You Need to Know About the Resource Manager

More from Maris Elsins

Recently uploaded

In this document

DB12c: All You Need to Know About the Resource Manager