From Tanel Poder's Troubleshooting Complex Performance Issues series - an example of Oracle SEG$ internal segment contention due to some direct path insert activity.
Tanel PoderA long-time computer performance geek, also entrepreneur. at PoderC LLC
2. gluent.com 2
Intro:
About
me
• Tanel
Põder
• Oracle
Database
Performance
geek
(18+
years)
• Exadata
Performance
geek
• Linux
Performance
geek
• Hadoop
Performance
geek
• CEO
&
co-‐founder:
Expert
Oracle
Exadata
book
(2nd edition
is
out
now!)
Instant
promotion
3. gluent.com 3
Gluent:
All
Enterprise
Data
Available
in
Hadoop!
GluentHadoop
Gluent
MSSQL
Tera-‐
data
IBM
DB2
Big
Data
Sources
Oracle
App
X
App
Y
App
Z
8. gluent.com 8
A
Data
Warehouse
data
loading/preparation
• A
large
Exadata
/
RAC
/
Oracle
11.2.0.3
reporting
environment
• Lots
of
parallel
CTAS
and
direct
path
loads
• High
concurrency,
high
parallelism
• Throughput
bad,
all
kinds
of
waits
showing
up:
• Other,
Cluster,
Configuration,
Application,
Concurrency,
User
I/O,
CPU
12. gluent.com 12
Wait
event
explanation
• buffer
busy
waits
• Buffer
is
physically
available
in
local
instance,
but
is
pinned
by
some
other
local
session
(in
incompatible
mode)
• gc
buffer
busy
acquire
• Someone
has
already
requested
the
remote
block
into
local
instance
• gc
buffer
busy
release
• Block
is
local,
but
has
to
be
shipped
out
to
a
remote
instance
first
(as
someone
in
the
remote
instance
had
requested
it
first)
• enq:
TX
-‐ allocate
ITL
entry
• Can't
change
block
as
all
the
block's
ITL
entries
are
held
by
others
• Can't
dynamically
add
more
ITLs
(no
space
in
block)
15. gluent.com 15
Which
object?
• Translate
file#,
block#
into
segment
names/numbers:
• Assumes
the
blocks
are
in
buffer
cache
SQL> SELECT objd data_object_id, COUNT(*)
FROM v$bh
WHERE file#=1
AND block# IN
( 279634,279635,279629,279632,279638,279636,279613,279662,
279628,279608,279653,279627,279642,279637,279643,279631,
279646,279622,279582,279649
)
GROUP BY objd ORDER BY 2 DESC;
DATA_OBJECT_ID COUNT(*)
-------------- ----------
8 113 All
blocks
belong
to
a
segment
with
data_object_id
=
8
16. gluent.com 16
What
is
segment
#8?
• Look
up
the
object
names:
• Using
DBA_OBJECTS.DATA_OBJECT_ID
-‐>
OBJECT_ID
SQL> @doid 8
object_id owner object_name O_PARTITION object_type
--------- --------- ----------------- ------------- -----------
8 SYS C_FILE#_BLOCK# CLUSTER
14 SYS SEG$ TABLE
13 SYS UET$ TABLE
This
segment
#8
is
a
cluster
which
contains
SEG$
(DBA_SEGMENTS)
and
UET$
(DBA_EXTENTS),
but
UET$
isn't
used
anymore
thanks
to
LMT
tablespaces.
That
leaves
SEG$
17. gluent.com 17
Write
contention
on
SEG$?
• SEG$ is
modified
when:
1. A
new
segment
is
created
(table,
index,
partition,
etc)
2. An
existing
segment
extends
3. A
segment
is
dropped
/
moved
4. Parallel
direct
path
loads
(CTAS)
can
also
create
many
temporary
segments
(one
per
PX
slave)
• …and
merge
these
into
final
segment
in
the
end
of
loading
More
about
this
later…
19. gluent.com 19
AWR
1
• No
CPU
starvation
evident
(checked
all
RAC
nodes)
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Event Waits Time(s) (ms) time Wait Class
---------------------------- ----------- -------- ----- ------ ----------
DB CPU 165,199 37.0
gc buffer busy acquire 1,260,128 68,399 54 15.3 Cluster
enq: TX - allocate ITL entry 354,496 40,583 114 9.1 Configurat
direct path write temp 4,632,946 37,455 8 8.4 User I/O
gc buffer busy release 213,750 22,683 106 5.1 Cluster
Host CPU (CPUs: 160 Cores: 80 Sockets: 8)
~~~~~~~~ Load Average
Begin End %User %System %WIO %Idle
--------- --------- --------- --------- --------- ---------
5.20 42.76 26.0 3.3 0.0 69.8
20. gluent.com 20
AWR
2
• AWR
also
listed
a
SEG$
insert
as
a
top
SQL:
Elapsed Elapsed Time
Time (s) Executions per Exec (s) %Total %CPU %IO SQL Id
----------- ------------- ------------- ------ ----- ----- -------------
100,669.1 4,444 22.65 22.6 .5 .0 g7mt7ptq286u7
insert into seg$ (file#,block#,type#,ts#,blocks,extents,minexts,maxexts,
extsize,extpct,user#,iniexts,lists,groups,cachehint,hwmincr, spare1,
scanhint, bitmapranges) values (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,
:13,:14,:15,:16,DECODE(:17,0,NULL,:17),:18,:19)
SNAP_ID NODE SQL_ID EXECS AVG_ETIME AVG_LIO AVG_PIO
------- ---- ------------- ------ ---------- -------- --------
24263 3 g7mt7ptq286u7 552 70.227 880.6 .0
• AWR
on
another
node
(same
insert
into
seq$):
Super-‐slow
single
row
inserts
into
SEG$
???
24. gluent.com 24
Checkpoint
1. Lots
of
parallel
CTAS
statements,
seem
to
wait
for
various
RAC
gc
buffer
busy
events
and
enq:
TX
– ITL
contention
2. CTAS
statements
create
new
segments,
new
segments
cause
inserts
into
SEG$
3. AWR
and
SQL
Trace
report
super-‐long
elapsed
times
for
single-‐row
SEG$
inserts
4. It's
actually
the
recursive
SEG$
inserts
that
wait
for
the
gc
buffer
busy and
enq:
TX
– ITL
contention
events
26. gluent.com 26
Multiple
layers
of
locking
&
coherency
ITL
2ITL
1
data2data1ccLH c2c1
ITL
4ITL
3
KCBH
-‐ block
common
header
KTBBH
-‐ transaction
common
header
KDBH
-‐ Data
header
Data block
Row
Level
0:
VOS/service
layer
-‐ Cache
buffers
chains
latch
Level
1:
Cache
layer
Buffer
pin
-‐ local
pin
-‐ global
cache
pin
Level
2:
Transaction
layer
-‐ ITL
entry
-‐ row
lock
byte
You
must
pin
(and
retrieve)
a
buffer
before
you
can
change
or
examine
it
What
if
you
pin
a
buffer
and
find
all
its
ITL
slots
busy?
(with
no
space
to
add
mode)
The
session
will
release
the
pin
and
start
waiting
for
enq:
TX
– ITL
contention event
And
re-‐get/pin
again
when
TX
–
ITL
wait
is
over!
28. gluent.com 28
OBJ
#9
and
#14
• Look
up
the
object
names:
• Using
DBA_OBJECTS.DATA_OBJECT_ID
-‐>
OBJECT_ID
SQL> @oid 9,14
owner object_name object_type
--------------- --------------------- ------------------
SYS I_FILE#_BLOCK# INDEX
SYS SEG$ TABLE
The
object
#9
is
a
cluster
index
for
cluster
C_FILE#_BLOCK#
(oid #8
on
a
previous
slide
SQL> @bclass 4
CLASS UNDO_SEGMENT_ID
------------------ ---------------
segment header
Bclass
#4
is
a
segment
header
block
that
also
stores
freelist
information
30. gluent.com 30
Conclusion
from
gathered
evidence
1. Lots
of
concurrent
+
highly
parallel
CTAS
statements
2. Each
PX
slave
created
its
own
temporary
segments
(in
spikes
when
PX
query
started
up)
3. Spikes
of
concurrent SEG$
inserts
simultaneously
on
4
RAC
nodes
4. SEG$
cluster
blocks
out
of
ITL
entries
5. This
caused
global
cache/TX
locking
trashing
– long
loops
of
unsuccessful
attempts
to
pin
and
insert
into
SEG$
blocks
31. gluent.com 31
Also
worth
noting
1. Resource
manager
a
possible
factor
• Plenty
of
resmgr:cpu quantum waits
• Lock
holders,
buffer
pin
holders
might
have
gone
to
sleep
– while
holding
the
pins/locks
under
contention
1. Freelist-‐based
block
space
allocation
• SEG$
is
a
SYS-‐owned
freelist-‐managed
index
cluster
• Many
cluster
table
blocks
walked
through
when
searching
for
space
2. ASH
doesn't
always
show
recursive
SQL
IDs
• Attributes
waits
to
parent
(top
level)
statement
SQL
ID
instead
• ASH
p1,
p2,
p3,
current_obj#
columns
are
useful
for
drilldown
32. gluent.com 32
How
to
fix
it?
• Do
it
less!
• What?
• Do
not
insert/update
SEG$
entries
so
frequently!
• How?
• Do
not
allow
parallel
direct
load
insert
to
create
a
temporary
loading
segment
for
each
slave (and
inserted
partition)!
• How?
• Make
sure
that
High
Water
Mark
Brokering
is
used!
• One
temporary
segment
-‐ thus
one
SEG$
insert/update/delete
-‐per
query
(and
inserted
partition)
=
much
less
SEG$
contention.
• More
in
following
slides
…
33. gluent.com 33
Why
is
High-‐Water-‐Mark
Brokering
needed?
• A
historical
problem
with
large
uniform
extent
sizes:
1. Each
data
loading
PX
slave
allocated
its
own
extents
(to
its
private
temporary
segment)
when
loading
data
2. When
the
load
was
completed
the
private
temporary
segments
were
merged
to
one
final
table
segment
3. The
last extent
of
each
private
segment
possibly
ended
up
not
fully
used
• Some
were
almost
full,
some
almost
empty
– half-‐empty
on
average
4. Wastage
=
0.5
x
extent_size x
PX_slaves
• The
more
PX
slaves,
the
more
wastage
• The
bigger
extent,
the
bigger
wastage
• References:
• https://blogs.oracle.com/datawarehousing/entry/parallel_load_uniform
_or_autoallocate
• http://www.oracle.com/technetwork/database/bi-‐
datawarehousing/twpdwbestpractices-‐for-‐loading-‐11g-‐404400.pdf
Problem
Solved
by
using
High-‐Water-‐Mark
Brokering!
But…
34. gluent.com 34
Parallel
Data
Loading
and
Large
Extents
– space
wastage?
SQL> @seg sales_parallel_ctas
SEG_MB OWNER SEGMENT_NAME SEG_TABLESPACE_NAME
------- ------- ------------------------- --------------------
9472 TANEL SALES_PARALLEL_CTAS TANEL_DEMO_LARGE
8896 TANEL SALES_PARALLEL_CTAS_FIX TANEL_DEMO_LARGE
SQL>SELECT * FROM TABLE(space_tools.get_space_usage('TANEL', 'SALES_PARALLEL_CTAS','TABLE'));
---------------------------------------------------------------------------------------
Full blocks /MB 1134669 8865
Unformatted blocks/MB 0 0
Free Space 0-25% 0 0
Free Space 25-50% 0 0
Free Space 50-75% 0 0
Free Space 75-100% 72963 570
SQL>SELECT * FROM TABLE(space_tools.get_space_usage('TANEL', 'SALES_PARALLEL_CTAS_FIX’,'TABLE'))
----------------------------------------------------------------------------------------
Full blocks /MB 1134669 8865
Unformatted blocks/MB 0 0
Free Space 0-25% 0 0
Free Space 25-50% 0 0
Free Space 50-75% 0 0
Free Space 75-100% 0 0
An
identical
table
PX
CTAS
in
the
same
tablespace
with
64MB
uniform
extent
(upper
table
6.4%
bigger)
570
MB
worth
of
blocks
unused
in
the
segment?
PX
DOP
16
x
64
*
~0.5
=
512MB
Same
table
loaded
with
INSERT
/*+
APPEND
*/
PX
DOP
16.
No
empty
blocks
35. gluent.com 35
Parallel
Data
Loading
and
Large
Extents
– HWM
brokering
• Instead
of
each
PX
slave
working
on
their
own
separate
segment…
1. Allocate
and
extend
only
one
segment
2. Slaves
allocate
space
within
the
single
segment
as
needed
3. No
“holes”
in
tables,
no
space
wastage
problem
anymore!
4. Serialized
via
the
HV
-‐ Direct
Loader
High
Water
Mark enqueue
• Parameter:
• _insert_enable_hwm_brokered =
true
(default)
• “during
parallel
inserts
high
water
marks
are
brokered”
• Problem:
• In
Oracle
11.2
(including
11.2.0.4)
this
only
works
with
INSERT
APPENDs
• …and
not
for
CTAS
nor
ALTER
TABLE
MOVE
• (in
one
case
it
worked
for
CTAS,
only
if
the
target
table
was
partitioned)
36. gluent.com 36
Parallel
Data
Loading
and
Large
Extents
– HWM
brokering
SQL> @fix 6941515
BUGNO VALUE DESCRIPTION IS_DEFAULT
------- ------ ------------------------------------------------------------ ----------
6941515 0 use high watermark brokering for insert into single segment 1
SQL> alter session set "_fix_control"='6941515:ON';
Session altered.
• Actually
this
is
a
bug,
and
Oracle
has
fixed
it
long
time
ago
• But
bugfix 6941515
is
not
enabled
by
default!
• After
enabling
the
bug-‐fix,
parallel
CTAS
and
parallel
ALTER
TABLE
MOVE
also
use
HWM
brokering…
• Unfortunately
it
didn’t
work
with
ALTER
TABLE
MOVE
if
moving
the
whole
table
if
the
table
was
partitioned.
DBMS_SPACE
/
space_tools helps
to
measure
this!
Thanks
to
Alex
Fatkulin for
spotting
this!
38. gluent.com 38
Case
Study:
Data
Loading
Performance
– Example
1
• Parallel
Create
Table
As
Select
from
Hadoop
to
Exadata
• Buffer
busy
waits
dominating
the
response
time
profile
(SQL
Monitoring)
http://blog.tanelpoder.com/2013/11/06/diagnosing-‐buffer-‐busy-‐waits-‐with-‐the-‐
ash_wait_chains-‐sql-‐script-‐v0-‐2/
39. gluent.com 39
Data
Loading
Performance:
ashtop.sql
SQL> @ash/ashtop session_state,event,p2text,p2,p3text,p3 sql_id='3rtbs9vqukc71'
"timestamp'2013-10-05 01:00:00'" "timestamp'2013-10-05 03:00:00'"
%This SESSION EVENT P2TEXT P2 P3TEXT P3
------ ------- --------------------------------- --------- -------- -------- -------
57% WAITING buffer busy waits block# 2 class# 13
31% ON CPU file# 0 size 524288
1% WAITING external table read file# 0 size 524288
1% ON CPU block# 2 class# 13
0% ON CPU consumer 12573 0
0% WAITING cell smart file creation 0 0
0% WAITING DFS lock handle id1 3 id2 2
0% ON CPU file# 41 size 41
0% WAITING cell single block physical read diskhash# 4695794 bytes 8192
0% WAITING control file parallel write block# 1 requests 2
0% WAITING control file parallel write block# 41 requests 2
0% WAITING change tracking file synchronous blocks 1 0
0% WAITING control file parallel write block# 42 requests 2
0% WAITING db file single write block# 1 blocks 1
0% ON CPU 0 0
• Break
the
(buffer
busy)
wait
events
down
by
block#/class#
block
#2
?
40. gluent.com 40
Case
Study:
Data
Loading
Performance
– Example
2
• Lots
of
serial
sessions
doing
single
row
inserts
– on
multiple
RAC
nodes
• “buffer
busy
waits”
and
“gc
buffer
busy
acquire”
waits
• file#
=
6
• block#
=
2
• class#
=
13
SQL> @bclass 13
CLASS UNDO_SEGMENT_ID
------------------ ---------------
file header block
SQL> select file#, block#, status
from v$bh where class# = 13;
FILE# BLOCK# STATUS
---------- ---------- ----------
5 2 xcur
4 2 xcur
...
Block dump from disk:
buffer tsn: 7 rdba: 0x00000002 (1024/2)
scn: 0x0000.010b6f9b seq: 0x02 flg: 0x04 tail:
0x6f9b1d02
frmt: 0x02 chkval: 0xd587 type: 0x1d=KTFB
Bitmapped File Space Header
Hex dump of block: st=0, typ_found=1
A
single
space
allocation
contention
point
per
LMT
file.
Bigfile
tablespaces
have
only
one
file!
42. gluent.com 42
Case
Study:
Parallel
Data
Loading
Performance
• Reduce
demand
on
the
LMT
bitmap
blocks
• By
allocating
bigger
extents
at
a
time
• Use
large
uniform
extents
for
fast-‐growing
tables(paces)
• The
customer
went
with
64MB
uniform
extent
size
• The
autoallocate extent
management
is
suboptimal
for
very
large
segments
• As
there’d
be
1
LMT
space
management
bit
per
64kB
regardless
of
your
INITIAL
extent
size
at
the
segment
level
• The
_partition_large_extents=
TRUE
doesn’t
change
this
either
• Large
uniform
extents
are
better
for
data
loading
and
scanning!