SlideShare a Scribd company logo
1 of 78
©2013	
  Enkitec	
  
Profiling	
  the	
  logwriter	
  and	
  
database	
  writer	
  
Frits	
  Hoogland	
  
Hotsos	
  2014,	
  Dallas
1
This is the font size used for showing screen output. Be sure this is readable for you.
This is the font used to accentuate text/console output. Make sure this is readable for you too!
`whoami`
• Frits	
  Hoogland	
  
• Working	
  with	
  Oracle	
  products	
  since	
  1996	
  
!
• Blog:	
  hHp://fritshoogland.wordpress.com	
  
• TwiHer:	
  @fritshoogland	
  
• Email:	
  frits.hoogland@enkitec.com	
  
• Oracle	
  ACE	
  Director	
  
• OakTable	
  Member
2
Goals	
  &	
  prerequisites	
  
• Goal:	
  learn	
  about	
  typical	
  behaviour	
  of	
  both	
  lgwr	
  
and	
  dbwr,	
  both	
  visible	
  (wait	
  events)	
  and	
  inner-­‐
working.	
  
!
• Prerequisites:	
  
– Understanding	
  of	
  (internal)	
  execuon	
  of	
  C	
  programs.	
  
– Understanding	
  of	
  Oracle	
  tracing	
  mechanisms.	
  
– Understanding	
  of	
  interacon	
  between	
  procs.	
  with	
  the	
  
Oracle	
  database.
3
Test	
  system
• The	
  tests	
  and	
  invesgaon	
  is	
  done	
  in	
  a	
  VM:	
  
– Host:	
  Mac	
  OSX	
  10.9	
  /	
  VMWare	
  Fusion	
  6.0.2.	
  
– VM:	
  Oracle	
  Linux	
  x86_64	
  6u5	
  (UEK3	
  3.8.13).	
  
– Oracle	
  Grid	
  11.2.0.4	
  with	
  ASM/External	
  redundancy.	
  
– Oracle	
  database	
  11.2.0.4.	
  
!
– Unless	
  specified	
  otherwise.
4
Logwriter,	
  concepts	
  guide
• From	
  the	
  concepts	
  guide:	
  
– The	
  lgwr	
  manages	
  the	
  redolog	
  buffer	
  
!
– The	
  lgwr	
  writes	
  all	
  redo	
  entries	
  that	
  have	
  been	
  copied	
  
in	
  the	
  buffer	
  since	
  the	
  last	
  me	
  it	
  wrote	
  when:	
  
• User	
  commits.	
  
• Logswitch.	
  
• Three	
  seconds	
  since	
  last	
  write.	
  
• Buffer	
  1/3th	
  full	
  or	
  1MB	
  filled.	
  
• dbwr	
  must	
  write	
  modified	
  (‘dirty’)	
  buffers.
5
Logwriter,	
  idle
• The	
  general	
  behaviour	
  of	
  the	
  log	
  writer	
  can	
  
easily	
  be	
  shown	
  by	
  puhng	
  a	
  10046/8	
  on	
  lgwr:	
  
!
SYS@v11204 AS SYSDBA> @who
…
130,1,@1 2243 oracle@ol65-ora11204-asm.local (LGWR)
…
SYS@v11204 AS SYSDBA> oradebug setospid 2243
Oracle pid: 11, Unix process pid: 2243, image: oracle@ol65-ora11204-asm.local (LGWR)
SYS@v11204 AS SYSDBA> oradebug unlimit
Statement processed.
SYS@v11204 AS SYSDBA> oradebug event 10046 trace name context forever, level 8;
Statement processed.
6
Logwriter,	
  idle
• The	
  10046/8	
  trace	
  shows:	
  
!
*** 2013-12-18 14:12:32.479
WAIT #0: nam='rdbms ipc message' ela= 2999925 timeout=300 p2=0 p3=0 obj#=-1
tim=1387372352479352
!
*** 2013-12-18 14:12:35.479
WAIT #0: nam='rdbms ipc message' ela= 3000075 timeout=300 p2=0 p3=0 obj#=-1
tim=1387372355479531
!
*** 2013-12-18 14:12:38.479
WAIT #0: nam='rdbms ipc message' ela= 2999755 timeout=300 p2=0 p3=0 obj#=-1
tim=1387372358479381
!
*** 2013-12-18 14:12:41.479
WAIT #0: nam='rdbms ipc message' ela= 3000021 timeout=300 p2=0 p3=0 obj#=-1
tim=1387372361479499
7
Logwriter,	
  idle
• “rdbms	
  ipc	
  message”	
  indicates	
  a	
  sleep/idle	
  event	
  
– There	
  isn’t	
  an	
  indicaon	
  lgwr	
  writes	
  something:	
  
!
semtimedop(327683, {{15, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily
unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 84000}, ru_stime={0, 700000}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 84000}, ru_stime={0, 700000}, ...}) = 0
times({tms_utime=8, tms_stime=70, tms_cutime=0, tms_cstime=0}) = 431286151
times({tms_utime=8, tms_stime=70, tms_cutime=0, tms_cstime=0}) = 431286151
times({tms_utime=8, tms_stime=70, tms_cutime=0, tms_cstime=0}) = 431286151
times({tms_utime=8, tms_stime=70, tms_cutime=0, tms_cstime=0}) = 431286151
semtimedop(327683, {{15, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily
unavailable)
…etc…
8
Logwriter,	
  idle
• It	
  does	
  look	
  in	
  the	
  /proc	
  filesystem	
  to	
  the	
  ‘stat’	
  
file	
  of	
  a	
  certain	
  process:	
  
!
open("/proc/2218/stat", O_RDONLY) = 21
read(21, "2218 (oracle) S 1 2218 2218 0 -1"..., 999) = 209
close(21)
!
• It	
  does	
  so	
  every	
  20th	
  me	
  (20*3)=	
  60	
  sec.	
  
• The	
  PID	
  is	
  PMON.	
  
9
Logwriter,	
  idle
• Recap:	
  
– In	
  an	
  idle	
  database.	
  
– The	
  lgwr	
  sleeps	
  on	
  a	
  semaphore	
  for	
  3	
  seconds.	
  
• Then	
  wakes	
  up,	
  and	
  sets	
  up	
  the	
  semaphore/sleep	
  again.	
  
• Processes	
  sleeping	
  on	
  a	
  semaphore	
  do	
  not	
  spend	
  CPU	
  
– Every	
  minute,	
  lgwr	
  reads	
  pmon's	
  process	
  stats.	
  
– lgwr	
  doesn’t	
  write	
  if	
  there’s	
  no	
  need.	
  
• But	
  what	
  happens	
  when	
  we	
  insert	
  a	
  row	
  of	
  data,	
  
and	
  commit	
  that?
10
Logwriter,	
  commit
TS@//localhost/v11204 > insert into t values ( 1, 'aaaa', 'bbbb' );
!
1 row created.
!
TS@//localhost/v11204 > commit;
!
Commit complete.
11
Logwriter,	
  commit	
  -­‐	
  expected
12
time
foreground
logwriter
semtimedop(458755, {{15, -1, 0}}, 1, {3, 0})
semctl(458755, 15, SETVAL, 0x7fff00000001)
semctl(458755, 33, SETVAL, 0x1)
semtimedop(458755, {{33, -1, 0}}, 1, {0, 100000000})
commit;
io_submit(139981752844288, 1,
{{0x7f5008e23480, 0, 1, 0, 256}})
io_getevents(139981752844288, 1, 128,
{{0x7f5008e23480, 0x7f5008e23480, 3584, 0}},
{600, 0})
Logwriter,	
  commit	
  -­‐	
  actual
13
time
foreground
logwriter
semtimedop(458755, {{15, -1, 0}}, 1, {3, 0})
semctl(458755, 15, SETVAL, 0x7fff00000001)
commit;
io_submit(139981752844288, 1,
{{0x7f5008e23480, 0, 1, 0, 256}})
io_getevents(139981752844288, 1, 128,
{{0x7f5008e23480, 0x7f5008e23480, 3584, 0}},
{600, 0})
no ‘log file sync’ wait!
No semctl()
No semtimedop()
Logwriter,	
  commit
• Invesgaon	
  shows:	
  
– Foreground	
  scans	
  log	
  writer	
  progress	
  up	
  to	
  3	
  mes.	
  
• kcrf_commit_force()	
  >	
  kcscur3()	
  	
  
!
– If	
  its	
  data*	
  in	
  the	
  redo	
  log	
  buffer	
  is	
  not	
  wriHen:	
  
• It	
  nofies	
  the	
  lgwr	
  that	
  it	
  is	
  going	
  to	
  sleep	
  on	
  a	
  semaphore.	
  
• semmedop()	
  for	
  100ms,	
  unl	
  posted	
  by	
  lgwr.	
  
– If	
  its	
  data*	
  has	
  been	
  wriHen:	
  
• No	
  need	
  to	
  wait	
  on	
  it.	
  
• No	
  ‘log	
  file	
  sync’	
  wait.
14
Logwriter,	
  commit
• Wait!!!	
  
– This	
  (no	
  log	
  file	
  sync)	
  turned	
  out	
  to	
  be	
  an	
  edge	
  case.	
  
• I	
  traced	
  the	
  kcrf_commit_force()	
  and	
  kcscur3()	
  calls	
  using	
  
breaks	
  in	
  gdb.	
  
!
– In	
  normal	
  situaBons,	
  the	
  wait	
  will	
  appear.	
  
• Depending	
  on	
  log	
  writer	
  and	
  FG	
  progress.	
  
• The	
  semBmedop()	
  call	
  in	
  the	
  FG	
  can	
  be	
  absent.	
  
– As	
  a	
  result,	
  lgwr	
  will	
  not	
  semctl()	
  
15
Logwriter,	
  commit	
  -­‐	
  post-­‐wait
16
time
foreground
logwriter
semtimedop(458755, {{15, -1, 0}}, 1, {3, 0})
semctl(458755, 15, SETVAL, 0x7fff00000001)
commit;
io_submit(139981752844288, 1,
{{0x7f5008e23480, 0, 1, 0, 256}})
io_getevents(139981752844288, 1, 128,
{{0x7f5008e23480, 0x7f5008e23480, 3584, 0}},
{600, 0})
log file parallel writerdbms ipc message
kcrf_commit_force()
kcscur3()
semtimedop(458755, {{33, -1, 0}}, 1, {0, 100000000})
log file sync
Grayed means: optional
adapBve	
  log	
  file	
  sync
• Feature	
  of	
  Oracle	
  11.2	
  
– Parameter	
  '_use_adapBve_log_file_sync'	
  
• Set	
  to	
  FALSE	
  up	
  to	
  11.2.0.2	
  
• Set	
  to	
  TRUE	
  starBng	
  from	
  11.2.0.3	
  
• Third	
  value	
  ‘POLLING_ONLY’	
  
– Makes	
  Oracle	
  adapBvely	
  switch	
  between	
  ‘post-­‐wait’	
  
and	
  polling.	
  
– The	
  log	
  writer	
  writes	
  a	
  noBficaBon	
  in	
  its	
  logfile	
  if	
  it	
  
switches	
  between	
  modes	
  (if	
  param	
  =	
  ‘TRUE’)
17
Logwriter,	
  commit	
  -­‐	
  polling
18
time
foreground
logwriter
semtimedop(458755, {{15, -1, 0}}, 1, {3, 0})
semctl(458755, 15, SETVAL, 0x7fff00000001)
commit;
io_submit(139981752844288, 1,
{{0x7f5008e23480, 0, 1, 0, 256}})
io_getevents(139981752844288, 1, 128,
{{0x7f5008e23480, 0x7f5008e23480, 3584, 0}},
{600, 0})
log file parallel writerdbms ipc message
kcrf_commit_force
kcscur3
log file sync
nanosleep({0, 9409000}, 0x7fff64725480)
nanosecs; varies
Logwriter,	
  post-­‐wait	
  vs.	
  polling
– No	
  wait	
  event	
  ‘log	
  file	
  sync’	
  if:	
  
• Lgwr	
  was	
  able	
  to	
  flush	
  the	
  commiaed	
  data	
  before	
  the	
  
foreground	
  has	
  issued	
  kcscur3()	
  2/3	
  Bmes	
  in	
  
kcrf_commit_force().	
  
– If	
  not,	
  the	
  foreground	
  starts	
  a	
  ‘log	
  file	
  sync’	
  wait.	
  
• If	
  in	
  “post-­‐wait”	
  mode	
  (default),	
  it	
  will	
  record	
  it’s	
  waiBng	
  
state	
  in	
  the	
  post-­‐wait	
  queue,	
  sleep	
  in	
  semBmedop()	
  for	
  
100ms	
  at	
  a	
  Bme,	
  waiBng	
  to	
  be	
  posted	
  by	
  lgwr.	
  
• If	
  in	
  “polling”	
  mode,	
  it	
  will	
  sleep	
  in	
  nanosleep()	
  for	
  
computed	
  Bme*,	
  then	
  check	
  lgwr	
  progress,	
  	
  if	
  lgwr	
  write	
  
has	
  progressed	
  beyond	
  its	
  commiaed	
  data	
  SCN:	
  end	
  wait,	
  
else	
  start	
  sleeping	
  in	
  nanosleep()	
  again.
19
Logwriter
• The	
  main	
  task	
  of	
  lgwr	
  is	
  to	
  flush	
  data	
  in	
  the	
  
logbuffer	
  to	
  disk.	
  
– The	
  lgwr	
  is	
  idle	
  when	
  waiBng	
  on	
  ‘rdbms	
  ipc	
  message’.	
  	
  	
  
– There	
  are	
  two	
  main*	
  indicators	
  of	
  lgwr	
  business:	
  
• CPU	
  Bme.	
  
• Wait	
  event	
  ‘log	
  file	
  parallel	
  write’.	
  
!
!
• The	
  lgwr	
  needs	
  to	
  be	
  able	
  to	
  get	
  onto	
  the	
  CPU	
  in	
  
order	
  to	
  do	
  process!
20
Logwriter	
  -­‐	
  idle
21
logwriter
semtimedop(458755, {{15, -1, 0}}, 1, {3, 0})
rdbms ipc message rdbms ipc message rdbms ipc message
Logwriter	
  -­‐	
  idle
22
logwriter
rdbms ipc message rdbms ipc message
Logwriter	
  -­‐	
  idle
23
gwriter
Idle mode latch gets:
‘messages’
‘mostly latch-free SCN’
‘lgwr LWN SCN’
‘KTF sga latch’
‘redo allocation’
‘messages’
Logwriter	
  -­‐	
  wriBng
24
gwriter
Write mode latch gets and frees:
‘messages’
‘mostly latch-free SCN’
‘lgwr LWN SCN’
‘KTF sga latch’
‘redo allocation’
‘messages’
‘redo writing’
Logwriter	
  -­‐	
  wriBng
25
gwriter
io_submit(139981752844288, n,
{{0x7f5008e23480, 0, 1, 0, 256}})
log file parallel write
io_getevents(139981752844288, n, 128,
{{0x7f5008e23480, 0x7f5008e23480, 3584, 0}},
{0, 0})
io_getevents(139981752844288, n, 128,
{{0x7f5008e23480, 0x7f5008e23480, 3584, 0}},
{600, 0})
With the linux ‘strace’ utility,
the non-blocking syscall is visible
OR
the blocking one syscall is visible.
• The	
  event	
  ‘log	
  file	
  parallel	
  write’	
  is	
  an	
  indicator	
  
of	
  IO	
  wait	
  Bme	
  for	
  the	
  lgwr.	
  
– NOT	
  IO	
  latency	
  Bme!!
Logwriter	
  -­‐	
  wriBng
26
io_submit() io_getevents() io_getevents()
real IO time: 10ms
real IO time: 7ms
log file parallel write: 9.85ms
Logwriter	
  wait	
  events
• rdbms	
  ipc	
  message	
  
– Bmeout:	
  300	
  (cenBseconds;	
  3	
  seconds).	
  
– process	
  sleeping	
  ~	
  3	
  seconds	
  on	
  semaphore.	
  
!
• log	
  file	
  parallel	
  write	
  
– files:	
  number	
  of	
  log	
  file	
  members.	
  
– blocks:	
  total	
  number	
  of	
  log	
  blocks	
  wriaen.	
  
– requests:	
  ?	
  
• I’ve	
  seen	
  this	
  differ	
  from	
  the	
  actual	
  numer	
  of	
  IO	
  requests.
27
Logwriter	
  wait	
  events
• Let’s	
  switch	
  the	
  database	
  to	
  synchronous	
  IO.	
  
– Some	
  planorms	
  have	
  difficulty	
  with	
  AIO	
  (HPUX!)	
  
– Got	
  to	
  check	
  if	
  your	
  config	
  does	
  use	
  AIO.	
  
• Found	
  out	
  by	
  accident	
  that	
  ASM+NFS	
  has	
  no	
  AIO	
  by	
  default.	
  
!
– Good	
  to	
  understand	
  what	
  the	
  absence	
  of	
  AIO	
  means.	
  
!
• If	
  you	
  can’t	
  use	
  AIO	
  today,	
  you	
  are	
  doing	
  it	
  
WRONG!
28
log	
  file	
  parallel	
  write
29
disk_asynch_io=FALSE	
  (no	
  AIO)	
  
!
kslwtbctx
semtimedop - 458755 semid, timeout: $62 = {tv_sec = 3, tv_nsec = 0}
kslwtectx -- Previous wait time: 584208: rdbms ipc message
!
pwrite64 - fd, size - 256,512
pwrite64 - fd, size - 256,512
!
kslwtbctx
kslwtectx -- Previous wait time: 782: log file parallel write
!
kslwtbctx
semtimedop - 458755 semid, timeout: $63 = {tv_sec = 2, tv_nsec = 310000000}
kslwtectx -- Previous wait time: 2315982: rdbms ipc message
timeout = 3s
wait time = 0.584208s
=> semaphore is posted!
two sequential writes
(two logfiles)
the wait begins AFTER the write (!)
it’s also suspiciously fast (0.8ms)
log	
  file	
  parallel	
  write
30
disk_asynch_io=FALSE	
  (no	
  AIO)	
  
	
   Let’s	
  add	
  100ms	
  to	
  the	
  IOs	
  (shell	
  sleep	
  0.1)	
  
!
kslwtbctx
semtimedop - 458755 semid, timeout: $3 = {tv_sec = 2, tv_nsec = 900000000}
kslwtectx -- Previous wait time: 2905568: rdbms ipc message
!
pwrite64 - fd, size - 256,512
>sleep 0.1
pwrite64 - fd, size - 256,512
>sleep 0.1
!
kslwtbctx
kslwtectx -- Previous wait time: 545: log file parallel write
Two writes again.
In the break a sleep of 100ms is
added.
This should make the timing at
least 200’000
The timing is 545 (0.5ms):
timing is off.
log	
  file	
  parallel	
  write
• Conclusion:	
  
– For	
  at	
  least	
  Oracle	
  version	
  11.2.	
  
– When	
  synchronous	
  IO	
  (pwrite64())	
  is	
  issued.	
  
• disk_asynch_io	
  =	
  FALSE	
  (ASM)	
  
• filesystemio_opBons	
  !=	
  “setall”	
  or	
  “asynch”	
  
– The	
  wait	
  event	
  does	
  not	
  Bme	
  the	
  IO	
  requests.	
  
!
• How	
  about	
  the	
  other	
  log	
  writer	
  wait	
  events?
31
control	
  file	
  sequenBal	
  read
32
disk_asynch_io=FALSE	
  (no	
  AIO)	
  
!
kslwtbctx
pread64 - fd, size - 256,16384
>sleep 0.1
kslwtectx -- Previous wait time: 100323: control file sequential read
!
This	
  event	
  is	
  correctly	
  Bmed.	
  
control	
  file	
  parallel	
  write
33
disk_asynch_io=FALSE	
  (no	
  AIO)	
  
!
pwrite64 - fd, size - 256,16384
>sleep 0.1
kslwtbctx
kslwtectx -- Previous wait time: 705: control file parallel write
!
This	
  event	
  is	
  incorrectly	
  Bmed!	
  
log	
  file	
  single	
  write
34
disk_asynch_io=FALSE	
  (no	
  AIO)	
  
!
kslwtbctx
pwrite64 - fd, size - 256,512
>sleep 0.1
kslwtectx -- Previous wait time: 104594: log file single write
!
This	
  event	
  is	
  correctly	
  Bmed.	
  
Logwriter	
  wait	
  events	
  logswitch
• Some	
  of	
  these	
  waits	
  typically	
  show	
  up	
  during	
  a	
  
logswitch.	
  
– This	
  are	
  all	
  the	
  waits	
  which	
  are	
  normally	
  seen:	
  
• os	
  thread	
  startup	
  (semctl()-­‐semBmedop())	
  
• control	
  file	
  sequenBal	
  read	
  (pread64())	
  
• control	
  file	
  parallel	
  write	
  (io_submit()-­‐io_getevents())	
  
• log	
  file	
  sequenBal	
  read	
  (pread64())	
  
• log	
  file	
  single	
  write	
  (pwrite64())	
  
• KSV	
  master	
  wait	
  (semctl()	
  post	
  to	
  dbwr)	
  
!
• This	
  is	
  with	
  AIO	
  enabled!
35
Logwriter,	
  Bmeout	
  message
• Warning:	
  
!
Warning: log write elapsed time 523ms, size 2760KB
!
• Printed	
  in	
  logwriter	
  tracefile	
  (NOT	
  alert.log)	
  
• Threshold	
  set	
  with	
  parameter:	
  
– _side_channel_batch_Bmeout_ms	
  (500ms)
36
Logwriter:	
  disable	
  logging
• The	
  “forbidden	
  switch”:	
  _disable_logging	
  
– Do	
  not	
  use	
  this	
  for	
  anything	
  else	
  than	
  tests!	
  
!
• Everything	
  is	
  done	
  the	
  same	
  —	
  no	
  magic	
  
– Except	
  the	
  write	
  by	
  the	
  lgwr	
  to	
  the	
  logfiles	
  
– No	
  ‘log	
  file	
  parallel	
  write’	
  
– Redo/control/data	
  files	
  are	
  synced	
  with	
  shut	
  normal	
  
!
• A	
  way	
  to	
  test	
  if	
  lgwr	
  IO	
  influences	
  db	
  processing
37
Logwriter:	
  exadata
• How	
  does	
  this	
  look	
  like	
  on	
  Exadata?
38
Logwriter:	
  exadata
kslwtbctx
semtimedop - 3309577 semid, timeout: $24 = {tv_sec = 2, tv_nsec = 970000000}
kslwtectx -- Previous wait time: 2973630: rdbms ipc message
!
$25 = "oss_write"
$26 = "oss_write"
$27 = "oss_write"
$28 = "oss_write"
!
kslwtbctx
$29 = "oss_wait"
$30 = "oss_wait"
$31 = "oss_wait"
$32 = "oss_wait"
kslwtectx -- Previous wait time: 2956: log file parallel write
!
kslwtbctx
semtimedop - 3309577 semid, timeout: $33 = {tv_sec = 3, tv_nsec = 0}
kslwtectx -- Previous wait time: 3004075: rdbms ipc message
39
The dbwr semaphore sleep.
The writes are issued here.
There is no io_submit like wait. This is not timed.
The wait is log file parallel write, identical to non-exadata.
It seems to wait for all issued IOs
Database	
  writer
• From	
  the	
  Oracle	
  11.2	
  concepts	
  guide:	
  
– The	
  DBWn	
  process	
  writes	
  dirty	
  buffers	
  to	
  disk	
  under	
  
the	
  following	
  condiBons:	
  
• When	
  a	
  server	
  process	
  cannot	
  find	
  a	
  clean	
  reusable	
  buffer	
  
axer	
  scanning	
  a	
  threshold	
  of	
  buffers,	
  it	
  signals	
  DBWn	
  to	
  
write.	
  DBWn	
  writes	
  dirty	
  buffers	
  to	
  disk	
  asynchronously	
  if	
  
possible	
  while	
  performing	
  other	
  processing.	
  
• DBWn	
  periodically	
  writes	
  buffers	
  to	
  advance	
  the	
  
checkpoint,	
  which	
  is	
  the	
  posiBon	
  in	
  the	
  redo	
  thread	
  from	
  
which	
  instance	
  recovery	
  begins.	
  The	
  log	
  posiBon	
  of	
  the	
  
checkpoint	
  is	
  determined	
  by	
  the	
  oldest	
  dirty	
  buffer	
  in	
  the	
  
buffer	
  cache.
40
Database	
  writer,	
  idle
• The	
  10046/8	
  trace	
  shows:	
  
!
*** 2013-12-31 00:45:51.088
WAIT #0: nam='rdbms ipc message' ela= 3006219 timeout=300 p2=0 p3=0 obj#=-1
tim=1388447151086891
!
*** 2013-12-31 00:45:54.142
WAIT #0: nam='rdbms ipc message' ela= 3005237 timeout=300 p2=0 p3=0 obj#=-1
tim=1388447154140873
!
*** 2013-12-31 00:45:57.197
WAIT #0: nam='rdbms ipc message' ela= 3005258 timeout=300 p2=0 p3=0 obj#=-1
tim=1388447157195828
!
*** 2013-12-31 00:46:00.255
WAIT #0: nam='rdbms ipc message' ela= 3005716 timeout=300 p2=0 p3=0 obj#=-1
tim=1388447160253960
41
Database	
  writer,	
  idle
• “rdbms	
  ipc	
  message”	
  indicates	
  a	
  sleep/idle	
  event	
  
– There	
  isn’t	
  an	
  indicaBon	
  dbw0	
  writes	
  something:	
  
!
semtimedop(983043, {{14, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily
unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 31000}, ru_stime={0, 89000}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 31000}, ru_stime={0, 89000}, ...}) = 0
times({tms_utime=3, tms_stime=8, tms_cutime=0, tms_cstime=0}) = 431915044
times({tms_utime=3, tms_stime=8, tms_cutime=0, tms_cstime=0}) = 431915044
times({tms_utime=3, tms_stime=8, tms_cutime=0, tms_cstime=0}) = 431915044
semtimedop(983043, {{14, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily
unavailable)
…etc…
42
Database	
  writer,	
  idle
• It	
  does	
  look	
  in	
  the	
  /proc	
  filesystem	
  to	
  the	
  ‘stat’	
  
file	
  of	
  a	
  certain	
  process:	
  
!
open("/proc/2218/stat", O_RDONLY) = 21
read(21, "2218 (oracle) S 1 2218 2218 0 -1"..., 999) = 209
close(21)
!
• It	
  does	
  so	
  every	
  20th	
  Bme	
  (20*3)=	
  60	
  sec.	
  
• The	
  PID	
  is	
  PMON.	
  
43
Database	
  writer,	
  idle
• Recap:	
  
– In	
  an	
  idle	
  database.	
  
– The	
  dbwr	
  sleeps	
  on	
  a	
  semaphore	
  for	
  3	
  seconds.	
  
• Then	
  wakes	
  up,	
  and	
  sets	
  up	
  the	
  semaphore/sleep	
  again.	
  
• Processes	
  sleeping	
  on	
  a	
  semaphore	
  do	
  not	
  spend	
  CPU	
  
– Every	
  minute,	
  dbwr	
  reads	
  pmon's	
  process	
  stats.	
  
– dbwr	
  doesn’t	
  write	
  if	
  there’s	
  no	
  need.
44
Database	
  writer,	
  force	
  write
• We	
  can	
  force	
  the	
  dbwr	
  to	
  write:	
  
– Dirty	
  some	
  blocks	
  (insert	
  a	
  row	
  into	
  a	
  table).	
  
– Force	
  a	
  thread	
  checkpoint	
  (alter	
  system	
  checkpoint).	
  
!
!
!
!
!
*	
  There	
  are	
  mulBple	
  ways,	
  this	
  is	
  one	
  of	
  them.
45
Database	
  writer,	
  force	
  write
10046/8	
  trace:	
  
*** 2014-01-03 03:37:47.473
WAIT #0: nam='rdbms ipc message' ela= 3000957 timeout=300 p2=0 p3=0 obj#=-1
tim=1388716667473024
!
*** 2014-01-03 03:37:49.735
WAIT #0: nam='rdbms ipc message' ela= 2261867 timeout=300 p2=0 p3=0 obj#=-1
tim=1388716669735046
WAIT #0: nam='db file async I/O submit' ela= 0 requests=3 interrupt=0 timeout=0
obj#=-1 tim=1388716669735493
WAIT #0: nam='db file parallel write' ela= 21 requests=1 interrupt=0
timeout=2147483647 obj#=-1 tim=1388716669735566
!
*** 2014-01-03 03:37:50.465
WAIT #0: nam='rdbms ipc message' ela= 729110 timeout=73 p2=0 p3=0 obj#=-1
tim=1388716670464967
46
elapsed time = 2.26 sec.
So the dbwr is posted!
db file async I/O submit?!
It looks like the io_submit() call is instrumented for the dbwr!
But what does ‘requests=3’ mean for a single row update
checkpoint?
And the write, via the event ‘db file parallel write’.
dbwr,	
  sql_trace	
  +	
  strace
• Let’s	
  take	
  a	
  look	
  at	
  the	
  Oracle	
  wait	
  events,	
  
together	
  with	
  the	
  actual	
  system	
  calls.	
  
!
• That	
  is:	
  
– Seyng	
  a	
  10046/8	
  event	
  for	
  trace	
  and	
  waits.	
  
– Execute	
  strace	
  with	
  ‘-­‐e	
  write=all	
  -­‐e	
  all’
47
dbwr,	
  sql_trace	
  +	
  strace
io_submit(140195085938688, 3, {{0x7f81b622ab10, 0, 1, 0, 256}, {0x7f81b622a8a0, 0,
1, 0, 256}, {0x7f81b622a630, 0, 1, 0, 256}}) = 3
write(13, "WAIT #0: nam='db file async I/O "..., 108) = 108
| 00000 57 41 49 54 20 23 30 3a 20 6e 61 6d 3d 27 64 62 WAIT #0: nam='db |
| 00010 20 66 69 6c 65 20 61 73 79 6e 63 20 49 2f 4f 20 file as ync I/O |
| 00020 73 75 62 6d 69 74 27 20 65 6c 61 3d 20 31 20 72 submit' ela= 1 r |
| 00030 65 71 75 65 73 74 73 3d 33 20 69 6e 74 65 72 72 equests= 3 interr |
| 00040 75 70 74 3d 30 20 74 69 6d 65 6f 75 74 3d 30 20 upt=0 ti meout=0 |
| 00050 6f 62 6a 23 3d 2d 31 20 74 69 6d 3d 31 33 38 38 obj#=-1 tim=1388 |
| 00060 39 37 37 36 35 31 38 30 34 32 36 31 97765180 4261 |
io_getevents(140195085938688, 1, 128, {{0x7f81b622ab10, 0x7f81b622ab10, 8192, 0},
{0x7f81b622a8a0, 0x7f81b622a8a0, 8192, 0}, {0x7f81b622a630, 0x7f81b622a630, 8192,
0}}, {600, 0}) = 3
write(13, "WAIT #0: nam='db file parallel w"..., 116) = 116
| 00000 57 41 49 54 20 23 30 3a 20 6e 61 6d 3d 27 64 62 WAIT #0: nam='db |
| 00010 20 66 69 6c 65 20 70 61 72 61 6c 6c 65 6c 20 77 file pa rallel w |
| 00020 72 69 74 65 27 20 65 6c 61 3d 20 35 38 20 72 65 rite' el a= 58 re |
| 00030 71 75 65 73 74 73 3d 31 20 69 6e 74 65 72 72 75 quests=1 interru |
| 00040 70 74 3d 30 20 74 69 6d 65 6f 75 74 3d 32 31 34 pt=0 tim eout=214 |
| 00050 37 34 38 33 36 34 37 20 6f 62 6a 23 3d 2d 31 20 7483647 obj#=-1 |
| 00060 74 69 6d 3d 31 33 38 38 39 37 37 36 35 31 38 30 tim=1388 97765180 |
| 00070 34 35 37 39 4579 |
48
dbwr,	
  sql_trace	
  +	
  strace
io_submit(140195085938688, 3, {{0x7f81b622ab10, 0, 1, 0, 256}, {0x7f81b622a8a0, 0,
1, 0, 256}, {0x7f81b622a630, 0, 1, 0, 256}}) = 3
write(13, "WAIT #0: nam='db file async I/O "..., 108) = 108
| 00000 57 41 49 54 20 23 30 3a 20 6e 61 6d 3d 27 64 62 WAIT #0: nam='db |
| 00010 20 66 69 6c 65 20 61 73 79 6e 63 20 49 2f 4f 20 file as ync I/O |
| 00020 73 75 62 6d 69 74 27 20 65 6c 61 3d 20 31 20 72 submit' ela= 1 r |
| 00030 65 71 75 65 73 74 73 3d 33 20 69 6e 74 65 72 72 equests= 3 interr |
| 00040 75 70 74 3d 30 20 74 69 6d 65 6f 75 74 3d 30 20 upt=0 ti meout=0 |
| 00050 6f 62 6a 23 3d 2d 31 20 74 69 6d 3d 31 33 38 38 obj#=-1 tim=1388 |
| 00060 39 37 37 36 35 31 38 30 34 32 36 31 97765180 4261 |
io_getevents(140195085938688, 1, 128, {{0x7f81b622ab10, 0x7f81b622ab10, 8192, 0},
{0x7f81b622a8a0, 0x7f81b622a8a0, 8192, 0}, {0x7f81b622a630, 0x7f81b622a630, 8192,
0}}, {600, 0}) = 3
write(13, "WAIT #0: nam='db file parallel w"..., 116) = 116
| 00000 57 41 49 54 20 23 30 3a 20 6e 61 6d 3d 27 64 62 WAIT #0: nam='db |
| 00010 20 66 69 6c 65 20 70 61 72 61 6c 6c 65 6c 20 77 file pa rallel w |
| 00020 72 69 74 65 27 20 65 6c 61 3d 20 35 38 20 72 65 rite' el a= 58 re |
| 00030 71 75 65 73 74 73 3d 31 20 69 6e 74 65 72 72 75 quests=1 interru |
| 00040 70 74 3d 30 20 74 69 6d 65 6f 75 74 3d 32 31 34 pt=0 tim eout=214 |
| 00050 37 34 38 33 36 34 37 20 6f 62 6a 23 3d 2d 31 20 7483647 obj#=-1 |
| 00060 74 69 6d 3d 31 33 38 38 39 37 37 36 35 31 38 30 tim=1388 97765180 |
| 00070 34 35 37 39 4579 |
49
dbwr,	
  sql_trace	
  +	
  strace
io_submit(140195085938688, 3, {{0x7f81b622ab10, 0, 1, 0, 256}, {0x7f81b622a8a0, 0,
1, 0, 256}, {0x7f81b622a630, 0, 1, 0, 256}}) = 3
write(13, "WAIT #0: nam='db file async I/O "..., 108) = 108
| 00000 57 41 49 54 20 23 30 3a 20 6e 61 6d 3d 27 64 62 WAIT #0: nam='db |
| 00010 20 66 69 6c 65 20 61 73 79 6e 63 20 49 2f 4f 20 file as ync I/O |
| 00020 73 75 62 6d 69 74 27 20 65 6c 61 3d 20 31 20 72 submit' ela= 1 r |
| 00030 65 71 75 65 73 74 73 3d 33 20 69 6e 74 65 72 72 equests= 3 interr |
| 00040 75 70 74 3d 30 20 74 69 6d 65 6f 75 74 3d 30 20 upt=0 ti meout=0 |
| 00050 6f 62 6a 23 3d 2d 31 20 74 69 6d 3d 31 33 38 38 obj#=-1 tim=1388 |
| 00060 39 37 37 36 35 31 38 30 34 32 36 31 97765180 4261 |
io_getevents(140195085938688, 1, 128, {{0x7f81b622ab10, 0x7f81b622ab10, 8192, 0},
{0x7f81b622a8a0, 0x7f81b622a8a0, 8192, 0}, {0x7f81b622a630, 0x7f81b622a630, 8192,
0}}, {600, 0}) = 3
write(13, "WAIT #0: nam='db file parallel w"..., 116) = 116
| 00000 57 41 49 54 20 23 30 3a 20 6e 61 6d 3d 27 64 62 WAIT #0: nam='db |
| 00010 20 66 69 6c 65 20 70 61 72 61 6c 6c 65 6c 20 77 file pa rallel w |
| 00020 72 69 74 65 27 20 65 6c 61 3d 20 35 38 20 72 65 rite' el a= 58 re |
| 00030 71 75 65 73 74 73 3d 31 20 69 6e 74 65 72 72 75 quests=1 interru |
| 00040 70 74 3d 30 20 74 69 6d 65 6f 75 74 3d 32 31 34 pt=0 tim eout=214 |
| 00050 37 34 38 33 36 34 37 20 6f 62 6a 23 3d 2d 31 20 7483647 obj#=-1 |
| 00060 74 69 6d 3d 31 33 38 38 39 37 37 36 35 31 38 30 tim=1388 97765180 |
| 00070 34 35 37 39 4579 |
50
This is the MINIMAL number of requests to reap before successful.
(min_nr - see man io_getevents) ?
?
?
The timeout for io_getevents() is set to 600 seconds.
struct timespec { sec, nsec }
Despite only needing 1 request, this call returned all 3.
This information is NOT EXTERNALISED (!!)
dbwr,	
  db	
  file	
  async	
  I/O	
  submit
• Let’s	
  take	
  a	
  look	
  at	
  the	
  what	
  the	
  documentaBon	
  
says	
  about	
  “db	
  file	
  async	
  I/O	
  submit”:	
  
!
51
(That’s	
  right…nothing)	
  	
  
!
dbwr,	
  db	
  file	
  async	
  I/O	
  submit
• My	
  Oracle	
  Support	
  on	
  “db	
  file	
  async	
  I/O	
  submit”:	
  
!
'db file async I/O submit' when FILESYSTEMIO_OPTIONS=NONE
[Article ID 1274737.1]
How To Address High Wait Times for the 'Direct Path Write Temp ' Wait Event
[Article ID 1576956.1]
!
• Both	
  don’t	
  describe	
  what	
  this	
  event	
  is.	
  
• 1st	
  note	
  is	
  only	
  for	
  filesystemio_opBons=NONE	
  
and	
  describes	
  the	
  event	
  not	
  being	
  tracked	
  prior	
  
to	
  version	
  11.2.0.2.
52
dbwr,	
  db	
  file	
  async	
  I/O	
  submit
• So	
  the	
  quesBon	
  is:	
  
!
– What	
  DOES	
  the	
  event	
  “db	
  file	
  async	
  I/O	
  submit”	
  mean?	
  
!
53
• The	
  obvious	
  answer	
  is:	
  
– InstrumentaBon	
  of	
  the	
  io_submit()	
  call.	
  
• My	
  answer	
  is:	
  
– Don’t	
  know.	
  
– But	
  NOT	
  the	
  instrumentaBon	
  of	
  io_submit().	
  
dbwr,	
  db	
  file	
  async	
  I/O	
  submit
• This	
  is	
  a	
  trace	
  of	
  the	
  relevant	
  C	
  funcBons:	
  
!
kslwtbctx
kslwtectx -- Previous wait time: 236317: rdbms ipc message
!
io_submit - 3,45e5a000 - nr,ctx
kslwtbctx
kslwtectx -- Previous wait time: 688: db file async I/O submit
!
kslwtbctx
io_getevents - 1,45e5a000 - minnr,ctx,timeout: $3 = {tv_sec = 600, tv_nsec = 0}
skgfr_return64 - 3 IOs returned
kslwtectx -- Previous wait time: 9604: db file parallel write
54
Waiting on a semaphore to be posted.
io_submit() for 3 IOs
The begin of the wait starts AFTER the io_submit()?
io_getevevents() is properly timed. min_nr=1, got 3 IOs
dbwr,	
  db	
  file	
  async	
  I/O	
  submit
• Trace	
  with	
  sleep	
  0.1	
  in	
  the	
  break	
  on	
  io_submit()	
  
!
kslwtbctx
kslwtectx -- Previous wait time: 385794: rdbms ipc message
!
io_submit - 3,45e5a000 - nr,ctx
> sleep 0.1
kslwtbctx
kslwtectx -- Previous wait time: 428: db file async I/O submit
!
kslwtbctx
io_getevents - 1,45e5a000 - minnr,ctx,timeout: $37 = {tv_sec = 600, tv_nsec = 0}
skgfr_return64 - 3 IOs returned
kslwtectx -- Previous wait time: 8053: db file parallel write
55
Waiting on a semaphore to be posted.
io_submit() for 3 IOs + sleep of 100’000
Wait time too low. io_submit() is not timed.
dbwr,	
  db	
  file	
  parallel	
  write
• Let’s	
  look	
  at	
  the	
  “db	
  file	
  parallel	
  write”	
  event.	
  
56
dbwr,	
  db	
  file	
  parallel	
  write
• DescripBon	
  from	
  the	
  Reference	
  Guide:	
  
!
db file parallel write
This event occurs in the DBWR. It indicates that the DBWR is performing a parallel write to
files and blocks. When the last I/O has gone to disk, the wait ends.
Wait Time: Wait until all of the I/Os are completed
Parameter
Description
requests: This indicates the total number of I/O requests, which will be the same as blocks
interrupt:
timeout: This indicates the timeout value in hundredths of a second to wait for the I/O
completion.
57
Correct Correct…but only if AIO is enabled.
Incorrect
Incorrect
Incorrect
Empty?
Probably incorrect.
Or does a timeout of
2’147’483’647
/100/60/60/24=
248.55 days
Make sense to
*anybody*?
dbwr,	
  db	
  file	
  parallel	
  write
• Recap	
  of	
  previous	
  traced	
  calls:	
  
!
kslwtbctx
kslwtectx -- Previous wait time: 236317: rdbms ipc message
!
io_submit - 3,45e5a000 - nr,ctx
kslwtbctx
kslwtectx -- Previous wait time: 688: db file async I/O submit
!
kslwtbctx
io_getevents - 1,45e5a000 - minnr,ctx,timeout: $3 = {tv_sec = 600, tv_nsec = 0}
skgfr_return64 - 3 IOs returned
kslwtectx -- Previous wait time: 9604: db file parallel write
58
So….how about severely limiting OS IO capacity and see what happens?
dbwr,	
  db	
  file	
  parallel	
  write
• Database	
  writer	
  —	
  severely	
  limited	
  IO	
  (	
  1	
  IOPS)	
  
io_submit - 366,45e5a000 - nr,ctx
kslwtbctx
kslwtectx -- Previous wait time: 1070: db file async I/O submit
!
kslwtbctx
io_getevents - 100,45e5a000 - minnr,ctx,timeout: $7 = {tv_sec = 600, tv_nsec = 0}
skgfr_return64 - 100 IOs returned
kslwtectx -- Previous wait time: 109334845: db file parallel write
!
io_getevents - 128,45e5a000 - minnr,ctx,timeout: $8 = {tv_sec = 0, tv_nsec = 0}
io_getevents - 128,45e5a000 - minnr,ctx,timeout: $9 = {tv_sec = 0, tv_nsec = 0}
!
io_submit - 73,45e5a000 - nr,ctx
kslwtbctx
kslwtectx -- Previous wait time: 486: db file async I/O submit
59
366 IO requests are submitted onto the OS.
But only 100 IOs are needed to satisfy io_getevents()
Which it does in this case… leaving outstanding IOs
The dbwr starts issuing non-blocking calls to reap IOs!
It seems to be always 2 if outstanding IOs remain.
Minnr = # outstanding IOs, max 128.
dbwr,	
  db	
  file	
  parallel	
  write
• This	
  got	
  me	
  thinking…	
  
• The	
  dbwr	
  submits	
  the	
  IOs	
  it	
  needs	
  to	
  write.	
  
!
• But	
  it	
  waits	
  for	
  a	
  variable	
  amount	
  of	
  IOs	
  to	
  finish.	
  
– Wait	
  event	
  ‘db	
  file	
  parallel	
  write’.	
  
– Amount	
  seems	
  33-­‐25%	
  of	
  submiaed	
  IOs*	
  
– Axer	
  that,	
  2	
  tries	
  to	
  reap	
  the	
  remaining	
  IOs*	
  
– Then	
  either	
  submit	
  again,	
  DFPW	
  unBl	
  IOs	
  reaped	
  or	
  
back	
  to	
  sleeping	
  on	
  semaphore.	
  
60
dbwr,	
  db	
  file	
  parallel	
  write
• This	
  means	
  ‘db	
  file	
  parallel	
  write’	
  is	
  not:	
  
– Physical	
  IO	
  indicator.	
  
– IO	
  latency	
  Bming	
  
!
• I’ve	
  come	
  to	
  the	
  conclusion	
  that	
  the	
  blocking	
  
io_getevents	
  call	
  for	
  a	
  number	
  of	
  IOs	
  of	
  the	
  dbwr	
  
is	
  an	
  IO	
  limiter.	
  
• …and	
  ’db	
  file	
  parallel	
  write’	
  is	
  the	
  Bming	
  of	
  it.
61
dbwr,	
  synchronous	
  IO
• Let’s	
  turn	
  AIO	
  off	
  again.	
  
– To	
  simulate	
  this,	
  I’ve	
  set	
  disk_asynch_io	
  to	
  FALSE.	
  
!
• And	
  set	
  a	
  10046/8	
  trace	
  and	
  strace	
  on	
  the	
  dbwr.	
  
• And	
  issue	
  the	
  SQLs	
  as	
  before:	
  
– insert	
  into	
  followed	
  by	
  commit	
  
– alter	
  system	
  checkpoint
62
dbwr,	
  synchronous	
  IO
pwrite(256, "62420020726100hG1700020063R001000009U10[G170"...,
8192, 8053121024) = 8192
pwrite(256, "22420024603000hG1700014220T0030r0j
30020301717"..., 8192, 7443103744) = 8192
pwrite(256, "&2420024003000hG
170002437221600000000000000"..., 8192, 7443054592) = 8192
write(11, "WAIT #0: nam='db file parallel w"..., 107) = 107
| 00000 57 41 49 54 20 23 30 3a 20 6e 61 6d 3d 27 64 62 WAIT #0: nam='db |
| 00010 20 66 69 6c 65 20 70 61 72 61 6c 6c 65 6c 20 77 file pa rallel w |
| 00020 72 69 74 65 27 20 65 6c 61 3d 20 32 30 20 72 65 rite' el a= 20 re |
| 00030 71 75 65 73 74 73 3d 33 20 69 6e 74 65 72 72 75 quests=3 interru |
| 00040 70 74 3d 30 20 74 69 6d 65 6f 75 74 3d 30 20 6f pt=0 tim eout=0 o |
| 00050 62 6a 23 3d 2d 31 20 74 69 6d 3d 31 33 38 39 38 bj#=-1 t im=13898 |
| 00060 30 32 32 38 31 34 35 36 37 34 30 02281456 740 |
write(11, "WAIT #0: nam='db file parallel w"..., 106) = 106
| 00000 57 41 49 54 20 23 30 3a 20 6e 61 6d 3d 27 64 62 WAIT #0: nam='db |
| 00010 20 66 69 6c 65 20 70 61 72 61 6c 6c 65 6c 20 77 file pa rallel w |
| 00020 72 69 74 65 27 20 65 6c 61 3d 20 31 20 72 65 71 rite' el a= 1 req |
| 00030 75 65 73 74 73 3d 33 20 69 6e 74 65 72 72 75 70 uests=3 interrup |
| 00040 74 3d 30 20 74 69 6d 65 6f 75 74 3d 30 20 6f 62 t=0 time out=0 ob |
| 00050 6a 23 3d 2d 31 20 74 69 6d 3d 31 33 38 39 38 30 j#=-1 ti m=138980 |
| 00060 32 32 38 31 34 35 38 34 39 32 22814584 92 |
63
3 pwrite() calls. This is synchronous IO!
The db file parallel write wait event shows 3 requests!
But why a second db file parallel write wait event?
dbwr,	
  synchronous	
  IO
• There’s	
  no	
  ‘db	
  file	
  async	
  I/O	
  submit’	
  wait	
  
anymore.	
  
– Which	
  is	
  good,	
  because	
  SIO	
  has	
  no	
  submit	
  phase.	
  
• The	
  ‘db	
  file	
  parallel	
  write’	
  waits	
  seem	
  suspicious.	
  
– It	
  seems	
  like	
  the	
  wait	
  for	
  DFPW	
  is	
  issued	
  twice.	
  
– Further	
  invesBgaBon	
  shows	
  that	
  it	
  does.	
  
• My	
  guess	
  this	
  is	
  a	
  bug	
  in	
  the	
  sync.	
  IO	
  implementaBon.	
  
• Let’s	
  look	
  a	
  level	
  deeper	
  and	
  see	
  if	
  there’s	
  more	
  
to	
  see.
64
dbwr,	
  synchronous	
  IO
kslwtbctx
semtimedop - 458755 semid, timeout: $18 = {tv_sec = 3, tv_nsec = 0}
kslwtectx -- Previous wait time: 1239214: rdbms ipc message
!
pwrite64 - fd, size - 256,8192
pwrite64 - fd, size - 256,8192
pwrite64 - fd, size - 256,8192
!
kslwtbctx
kslwtectx -- Previous wait time: 949: db file parallel write
!
kslwtbctx
kslwtectx -- Previous wait time: 650: db file parallel write
!
kslwtbctx
semtimedop - 458755 semid, timeout: $19 = {tv_sec = 1, tv_nsec = 620000000}
65
This is clearly the semaphore being posted: timeout=3s,
wait time = 1239,2ms
3 IO’s in serial using pwrite().
The only possibility if there isn’t AIO of course.
Two db file parallel write (which aren’t parallel) for which
both the begin of the waits are started AFTER the IO (!!)
After the IOs are done, the dbwr continues sleeping.
dbwr,	
  synchronous	
  IO
• Let’s	
  do	
  the	
  same	
  trick	
  as	
  done	
  earlier:	
  
– In	
  gdb,	
  add	
  “shell	
  sleep	
  0.1”	
  to	
  the	
  pwrite	
  call.	
  
– This	
  makes	
  the	
  execuBon	
  of	
  this	
  call	
  take	
  100ms	
  longer.	
  
– To	
  see	
  if	
  there’s	
  sBll	
  some	
  way	
  Oracle	
  Bmes	
  it	
  properly.
66
dbwr,	
  synchronous	
  IO
kslwtbctx
semtimedop - 458755 semid, timeout: $23 = {tv_sec = 3, tv_nsec = 0}
kslwtectx -- Previous wait time: 92080: rdbms ipc message
!
pwrite64 - fd, size - 256,8192
> shell sleep 0.1
pwrite64 - fd, size - 256,8192
> shell sleep 0.1
pwrite64 - fd, size - 256,8192
> shell sleep 0.1
!
kslwtbctx
kslwtectx -- Previous wait time: 478: db file parallel write
!
kslwtbctx
kslwtectx -- Previous wait time: 495: db file parallel write
!
kslwtbctx
semtimedop - 458755 semid, timeout: $24 = {tv_sec = 2, tv_nsec = 460000000}
67
The 3 IOs again, each sleeps in pwrite() for 100ms (0.1s)
Yet the ‘db file parallel write’ wait shows a waiting time of 478;
which is 0.478ms: the timing is wrong.
dbwr,	
  synchronous	
  IO
• So,	
  my	
  conclusion	
  on	
  the	
  wait	
  events	
  for	
  the	
  
dbwr	
  with	
  synchronous	
  IO:	
  
– The	
  events	
  are	
  not	
  properly	
  Bmed	
  
– It	
  seems	
  like	
  the	
  wait	
  for	
  DFPW	
  is	
  issued	
  twice.	
  
– Further	
  invesBgaBon	
  shows	
  that	
  it	
  does.	
  
• My	
  guess	
  this	
  is	
  a	
  bug	
  in	
  the	
  sync.	
  IO	
  implementaBon.
68
dbwr:	
  exadata
• How	
  does	
  this	
  look	
  like	
  on	
  Exadata?
69
dbwr:	
  exadata
kslwtbctx
semtimedop - 3309577 semid, timeout: $389 = {tv_sec = 3, tv_nsec = 0}
kslwtectx -- Previous wait time: 1266041: rdbms ipc message
$390 = "oss_write"
$391 = "oss_write"
$392 = "oss_write"
$393 = "oss_write"
$394 = "oss_write"
$395 = "oss_write"
!
kslwtbctx
kslwtectx -- Previous wait time: 684: db file async I/O submit
!
kslwtbctx
$396 = "oss_wait"
$397 = "oss_wait"
kslwtectx -- Previous wait time: 2001: db file parallel write
$398 = "oss_wait"
$399 = "oss_wait"
$400 = "oss_wait"
$401 = "oss_wait"
!
semctl - 3309577,23,16 - semid, semnum, cmd
kslwtbctx
semtimedop - 3309577 semid, timeout: $402 = {tv_sec = 1, tv_nsec = 630000000}
kslwtectx -- Previous wait time: 1634299: rdbms ipc message
70
The dbwr semaphore sleep.
The writes are issued here.
This is not timed.
There is the db file async I/O submit.
Again, it doesn’t seem to time any of the typical IO calls!
And there we got the db file parallel write.
It does seem to always time two oss_wait() calls…
But it looks for more IOs to finish, alike the trailing io_getevents() calls.
I am quite sure oss_wait() is a blocking call…
Conclusion
• Logwriter:	
  
– When	
  idle,	
  is	
  sleeping	
  on	
  a	
  semaphore/rdbms	
  ipc	
  message	
  
– Gets	
  posted	
  with	
  semctl()	
  to	
  do	
  work.	
  
– Only	
  writes	
  when	
  it	
  needs	
  to	
  do	
  so.	
  
– Version	
  11.2.0.3:	
  two	
  methods	
  for	
  posBng	
  FGs:	
  
– Polling	
  and	
  post/wait.	
  
– Post/wait	
  is	
  default,	
  might	
  switch	
  to	
  polling.	
  
– NoBficaBon	
  of	
  switch	
  is	
  in	
  log	
  writer	
  trace	
  file.	
  
– Polling/nanosleep()	
  Bme	
  is	
  variable.
71
Conclusion
• Logwriter:	
  
– Log	
  file	
  parallel	
  write	
  	
  
– AIO:	
  two	
  io_getevents()	
  calls.	
  
– AIO:	
  Bme	
  waiBng	
  for	
  all	
  lgwr	
  submiaed	
  IOs	
  to	
  finish.	
  
– Not	
  IO	
  latency	
  -me!	
  
– SIO:	
  does	
  not	
  do	
  parallel	
  writes,	
  but	
  serial.	
  
– SIO:	
  does	
  not	
  Bme	
  IO.
72
Conclusion
• Logwriter:	
  
– Wait	
  event	
  IO	
  Bming:	
  
– All	
  the	
  ‘*	
  parallel	
  read’	
  and	
  ‘*	
  parallel	
  write’	
  	
  events	
  do	
  
not	
  seem	
  to	
  Bme	
  IO	
  correctly	
  with	
  synchronous	
  IO.	
  
– All	
  the	
  events	
  which	
  cover	
  single	
  block	
  IOs	
  do	
  use	
  
synchronous	
  IO	
  calls,	
  even	
  with	
  asynchronous	
  IO	
  set.	
  
– Logwriter	
  writes	
  a	
  warning	
  when	
  IO	
  Bme	
  exceeds	
  500ms	
  in	
  
the	
  log	
  writer	
  trace	
  file.	
  
– _disable_logging	
  *only*	
  disables	
  write	
  to	
  logs.	
  
!
73
Conclusion
• Database	
  writer:	
  
– When	
  idle,	
  is	
  sleeping	
  on	
  a	
  semaphore/rdbms	
  ipc	
  message	
  
– Gets	
  posted	
  with	
  semctl()	
  to	
  do	
  work.	
  
– Only	
  writes	
  when	
  it	
  needs	
  to	
  do	
  so.	
  
– Since	
  version	
  11.2.0.2,	
  event	
  ‘db	
  file	
  async	
  I/O	
  submit’:	
  
– Is	
  not	
  shown	
  with	
  synchronous	
  I/O.	
  
– Shows	
  the	
  actual	
  amount	
  of	
  IOs	
  submiaed.	
  
– Does	
  not	
  Bme	
  io_submit()	
  
– Unknown	
  what	
  or	
  if	
  it	
  Bmes	
  something.
74
Conclusion
• Database	
  writer:	
  
– Event	
  ‘db	
  file	
  parallel	
  write’:	
  
– Shows	
  the	
  minimal	
  number	
  io_getevents()	
  waits	
  for.	
  
– The	
  number	
  of	
  requests	
  it	
  waits	
  for	
  varies,	
  but	
  mostly	
  
seems	
  to	
  be	
  ~	
  25-­‐33%	
  of	
  submiaed	
  IOs.	
  
– Axer	
  the	
  Bmed,	
  blocking,	
  io_getevents()	
  call,	
  it	
  issues	
  
two	
  non-­‐blocking	
  io_getevents()	
  calls	
  for	
  the	
  remaining	
  
non-­‐reaped	
  IOs,	
  if	
  any.	
  
– My	
  current	
  idea	
  is	
  the	
  blocking	
  io_getevents()	
  call	
  is	
  an	
  
IO	
  throale	
  mechanism.
75
Conclusion
• Database	
  writer:	
  
– Event	
  ‘db	
  file	
  parallel	
  write’,	
  with	
  synchronous	
  IO:	
  
– pwrite64()	
  calls	
  are	
  issued	
  serially.	
  
– These	
  are	
  not	
  Bmed.	
  
– The	
  event	
  is	
  triggered	
  twice.	
  
!
– On	
  exadata,	
  two	
  out	
  of	
  the	
  total	
  number	
  of	
  oss_wait()	
  calls	
  
are	
  Bmed	
  with	
  the	
  event	
  ‘db	
  file	
  parallel	
  write’.
76
!
!
!
!
Q	
  &	
  A
77
Thanks	
  &	
  Links
78
• Enkitec	
  
• Tanel	
  Poder,	
  MarBn	
  Bach,	
  Klaas-­‐Jan	
  Jongsma,	
  Jeremy	
  
Schneider,	
  Karl	
  Arao,	
  Michael	
  Fontana.	
  	
  
• hap://www.pythian.com/blog/adapBve-­‐log-­‐file-­‐
sync-­‐oracle-­‐please-­‐dont-­‐do-­‐that-­‐again/	
  
• hap://files.e2sn.com/slides/
Tanel_Poder_log_file_sync.pdf	
  

More Related Content

What's hot

Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...Anne Nicolas
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingScyllaDB
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbWei Shan Ang
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Tier1 App
 
Bpf performance tools chapter 4 bcc
Bpf performance tools chapter 4   bccBpf performance tools chapter 4   bcc
Bpf performance tools chapter 4 bccViller Hsiao
 
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)Wei Shan Ang
 
RxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsRxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsBrendan Gregg
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelAnne Nicolas
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Kernel Recipes 2015 - Porting Linux to a new processor architecture
Kernel Recipes 2015 - Porting Linux to a new processor architectureKernel Recipes 2015 - Porting Linux to a new processor architecture
Kernel Recipes 2015 - Porting Linux to a new processor architectureAnne Nicolas
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
 
Oracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingOracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingTanel Poder
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 InstancesBrendan Gregg
 
Much Ado About Blocking: Wait/Wakke in the Linux Kernel
Much Ado About Blocking: Wait/Wakke in the Linux KernelMuch Ado About Blocking: Wait/Wakke in the Linux Kernel
Much Ado About Blocking: Wait/Wakke in the Linux KernelDavidlohr Bueso
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018Brendan Gregg
 

What's hot (20)

Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodb
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
 
Bpf performance tools chapter 4 bcc
Bpf performance tools chapter 4   bccBpf performance tools chapter 4   bcc
Bpf performance tools chapter 4 bcc
 
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
 
Staging driver sins
Staging driver sinsStaging driver sins
Staging driver sins
 
Le guide de dépannage de la jvm
Le guide de dépannage de la jvmLe guide de dépannage de la jvm
Le guide de dépannage de la jvm
 
RxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsRxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance Results
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Kernel Recipes 2015 - Porting Linux to a new processor architecture
Kernel Recipes 2015 - Porting Linux to a new processor architectureKernel Recipes 2015 - Porting Linux to a new processor architecture
Kernel Recipes 2015 - Porting Linux to a new processor architecture
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
 
Ping to Pong
Ping to PongPing to Pong
Ping to Pong
 
Oracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingOracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention Troubleshooting
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 Instances
 
Much Ado About Blocking: Wait/Wakke in the Linux Kernel
Much Ado About Blocking: Wait/Wakke in the Linux KernelMuch Ado About Blocking: Wait/Wakke in the Linux Kernel
Much Ado About Blocking: Wait/Wakke in the Linux Kernel
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018
 

Similar to Profiling the logwriter and database writer

Profiling the logwriter and database writer
Profiling the logwriter and database writerProfiling the logwriter and database writer
Profiling the logwriter and database writerKyle Hailey
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyJean-François Gagné
 
Application Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyApplication Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyTim Bunce
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyJean-François Gagné
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchYutaka Yasuda
 
MySQL Tuning using digested slow-logs
MySQL Tuning using digested slow-logsMySQL Tuning using digested slow-logs
MySQL Tuning using digested slow-logsBob Burgess
 
OS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switchOS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switchDaniel Ben-Zvi
 
Fine grained monitoring
Fine grained monitoringFine grained monitoring
Fine grained monitoringIben Rodriguez
 
Riding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication StreamRiding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication StreamJean-François Gagné
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiTakuya ASADA
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and ArchitectureSidney Chen
 
riyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdf
riyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdfriyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdf
riyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdfabdulhafeezkalsekar1
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2PgTraining
 
Kettunen, miaubiz fuzzing at scale and in style
Kettunen, miaubiz   fuzzing at scale and in styleKettunen, miaubiz   fuzzing at scale and in style
Kettunen, miaubiz fuzzing at scale and in styleDefconRussia
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11gfcamachob
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01Karam Abuataya
 
Deep review of LMS process
Deep review of LMS processDeep review of LMS process
Deep review of LMS processRiyaj Shamsudeen
 

Similar to Profiling the logwriter and database writer (20)

Profiling the logwriter and database writer
Profiling the logwriter and database writerProfiling the logwriter and database writer
Profiling the logwriter and database writer
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
Haproxy - zastosowania
Haproxy - zastosowaniaHaproxy - zastosowania
Haproxy - zastosowania
 
Application Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyApplication Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.key
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow Switch
 
MySQL Tuning using digested slow-logs
MySQL Tuning using digested slow-logsMySQL Tuning using digested slow-logs
MySQL Tuning using digested slow-logs
 
OS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switchOS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switch
 
Fine grained monitoring
Fine grained monitoringFine grained monitoring
Fine grained monitoring
 
Riding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication StreamRiding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication Stream
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and Architecture
 
Node js lecture
Node js lectureNode js lecture
Node js lecture
 
riyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdf
riyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdfriyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdf
riyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdf
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2
 
Kettunen, miaubiz fuzzing at scale and in style
Kettunen, miaubiz   fuzzing at scale and in styleKettunen, miaubiz   fuzzing at scale and in style
Kettunen, miaubiz fuzzing at scale and in style
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11g
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
 
Deep review of LMS process
Deep review of LMS processDeep review of LMS process
Deep review of LMS process
 

More from Enkitec

Using Angular JS in APEX
Using Angular JS in APEXUsing Angular JS in APEX
Using Angular JS in APEXEnkitec
 
Controlling execution plans 2014
Controlling execution plans   2014Controlling execution plans   2014
Controlling execution plans 2014Enkitec
 
Engineered Systems: Environment-as-a-Service Demonstration
Engineered Systems: Environment-as-a-Service DemonstrationEngineered Systems: Environment-as-a-Service Demonstration
Engineered Systems: Environment-as-a-Service DemonstrationEnkitec
 
Think Exa!
Think Exa!Think Exa!
Think Exa!Enkitec
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneEnkitec
 
In Search of Plan Stability - Part 1
In Search of Plan Stability - Part 1In Search of Plan Stability - Part 1
In Search of Plan Stability - Part 1Enkitec
 
Mini Session - Using GDB for Profiling
Mini Session - Using GDB for ProfilingMini Session - Using GDB for Profiling
Mini Session - Using GDB for ProfilingEnkitec
 
Profiling Oracle with GDB
Profiling Oracle with GDBProfiling Oracle with GDB
Profiling Oracle with GDBEnkitec
 
Oracle Performance Tools of the Trade
Oracle Performance Tools of the TradeOracle Performance Tools of the Trade
Oracle Performance Tools of the TradeEnkitec
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsEnkitec
 
SQL Tuning Tools of the Trade
SQL Tuning Tools of the TradeSQL Tuning Tools of the Trade
SQL Tuning Tools of the TradeEnkitec
 
Using SQL Plan Management (SPM) to Balance Plan Flexibility and Plan Stability
Using SQL Plan Management (SPM) to Balance Plan Flexibility and Plan StabilityUsing SQL Plan Management (SPM) to Balance Plan Flexibility and Plan Stability
Using SQL Plan Management (SPM) to Balance Plan Flexibility and Plan StabilityEnkitec
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceEnkitec
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture PerformanceEnkitec
 
APEX Security Primer
APEX Security PrimerAPEX Security Primer
APEX Security PrimerEnkitec
 
How Many Ways Can I Manage Oracle GoldenGate?
How Many Ways Can I Manage Oracle GoldenGate?How Many Ways Can I Manage Oracle GoldenGate?
How Many Ways Can I Manage Oracle GoldenGate?Enkitec
 
Understanding how is that adaptive cursor sharing (acs) produces multiple opt...
Understanding how is that adaptive cursor sharing (acs) produces multiple opt...Understanding how is that adaptive cursor sharing (acs) produces multiple opt...
Understanding how is that adaptive cursor sharing (acs) produces multiple opt...Enkitec
 
Sql tuning made easier with sqltxplain (sqlt)
Sql tuning made easier with sqltxplain (sqlt)Sql tuning made easier with sqltxplain (sqlt)
Sql tuning made easier with sqltxplain (sqlt)Enkitec
 
Fatkulin hotsos 2014
Fatkulin hotsos 2014Fatkulin hotsos 2014
Fatkulin hotsos 2014Enkitec
 
Combining ACS Flexibility with SPM Stability
Combining ACS Flexibility with SPM StabilityCombining ACS Flexibility with SPM Stability
Combining ACS Flexibility with SPM StabilityEnkitec
 

More from Enkitec (20)

Using Angular JS in APEX
Using Angular JS in APEXUsing Angular JS in APEX
Using Angular JS in APEX
 
Controlling execution plans 2014
Controlling execution plans   2014Controlling execution plans   2014
Controlling execution plans 2014
 
Engineered Systems: Environment-as-a-Service Demonstration
Engineered Systems: Environment-as-a-Service DemonstrationEngineered Systems: Environment-as-a-Service Demonstration
Engineered Systems: Environment-as-a-Service Demonstration
 
Think Exa!
Think Exa!Think Exa!
Think Exa!
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry Osborne
 
In Search of Plan Stability - Part 1
In Search of Plan Stability - Part 1In Search of Plan Stability - Part 1
In Search of Plan Stability - Part 1
 
Mini Session - Using GDB for Profiling
Mini Session - Using GDB for ProfilingMini Session - Using GDB for Profiling
Mini Session - Using GDB for Profiling
 
Profiling Oracle with GDB
Profiling Oracle with GDBProfiling Oracle with GDB
Profiling Oracle with GDB
 
Oracle Performance Tools of the Trade
Oracle Performance Tools of the TradeOracle Performance Tools of the Trade
Oracle Performance Tools of the Trade
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning Fundamentals
 
SQL Tuning Tools of the Trade
SQL Tuning Tools of the TradeSQL Tuning Tools of the Trade
SQL Tuning Tools of the Trade
 
Using SQL Plan Management (SPM) to Balance Plan Flexibility and Plan Stability
Using SQL Plan Management (SPM) to Balance Plan Flexibility and Plan StabilityUsing SQL Plan Management (SPM) to Balance Plan Flexibility and Plan Stability
Using SQL Plan Management (SPM) to Balance Plan Flexibility and Plan Stability
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture Performance
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture Performance
 
APEX Security Primer
APEX Security PrimerAPEX Security Primer
APEX Security Primer
 
How Many Ways Can I Manage Oracle GoldenGate?
How Many Ways Can I Manage Oracle GoldenGate?How Many Ways Can I Manage Oracle GoldenGate?
How Many Ways Can I Manage Oracle GoldenGate?
 
Understanding how is that adaptive cursor sharing (acs) produces multiple opt...
Understanding how is that adaptive cursor sharing (acs) produces multiple opt...Understanding how is that adaptive cursor sharing (acs) produces multiple opt...
Understanding how is that adaptive cursor sharing (acs) produces multiple opt...
 
Sql tuning made easier with sqltxplain (sqlt)
Sql tuning made easier with sqltxplain (sqlt)Sql tuning made easier with sqltxplain (sqlt)
Sql tuning made easier with sqltxplain (sqlt)
 
Fatkulin hotsos 2014
Fatkulin hotsos 2014Fatkulin hotsos 2014
Fatkulin hotsos 2014
 
Combining ACS Flexibility with SPM Stability
Combining ACS Flexibility with SPM StabilityCombining ACS Flexibility with SPM Stability
Combining ACS Flexibility with SPM Stability
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Profiling the logwriter and database writer

  • 1. ©2013  Enkitec   Profiling  the  logwriter  and   database  writer   Frits  Hoogland   Hotsos  2014,  Dallas 1 This is the font size used for showing screen output. Be sure this is readable for you. This is the font used to accentuate text/console output. Make sure this is readable for you too!
  • 2. `whoami` • Frits  Hoogland   • Working  with  Oracle  products  since  1996   ! • Blog:  hHp://fritshoogland.wordpress.com   • TwiHer:  @fritshoogland   • Email:  frits.hoogland@enkitec.com   • Oracle  ACE  Director   • OakTable  Member 2
  • 3. Goals  &  prerequisites   • Goal:  learn  about  typical  behaviour  of  both  lgwr   and  dbwr,  both  visible  (wait  events)  and  inner-­‐ working.   ! • Prerequisites:   – Understanding  of  (internal)  execuon  of  C  programs.   – Understanding  of  Oracle  tracing  mechanisms.   – Understanding  of  interacon  between  procs.  with  the   Oracle  database. 3
  • 4. Test  system • The  tests  and  invesgaon  is  done  in  a  VM:   – Host:  Mac  OSX  10.9  /  VMWare  Fusion  6.0.2.   – VM:  Oracle  Linux  x86_64  6u5  (UEK3  3.8.13).   – Oracle  Grid  11.2.0.4  with  ASM/External  redundancy.   – Oracle  database  11.2.0.4.   ! – Unless  specified  otherwise. 4
  • 5. Logwriter,  concepts  guide • From  the  concepts  guide:   – The  lgwr  manages  the  redolog  buffer   ! – The  lgwr  writes  all  redo  entries  that  have  been  copied   in  the  buffer  since  the  last  me  it  wrote  when:   • User  commits.   • Logswitch.   • Three  seconds  since  last  write.   • Buffer  1/3th  full  or  1MB  filled.   • dbwr  must  write  modified  (‘dirty’)  buffers. 5
  • 6. Logwriter,  idle • The  general  behaviour  of  the  log  writer  can   easily  be  shown  by  puhng  a  10046/8  on  lgwr:   ! SYS@v11204 AS SYSDBA> @who … 130,1,@1 2243 oracle@ol65-ora11204-asm.local (LGWR) … SYS@v11204 AS SYSDBA> oradebug setospid 2243 Oracle pid: 11, Unix process pid: 2243, image: oracle@ol65-ora11204-asm.local (LGWR) SYS@v11204 AS SYSDBA> oradebug unlimit Statement processed. SYS@v11204 AS SYSDBA> oradebug event 10046 trace name context forever, level 8; Statement processed. 6
  • 7. Logwriter,  idle • The  10046/8  trace  shows:   ! *** 2013-12-18 14:12:32.479 WAIT #0: nam='rdbms ipc message' ela= 2999925 timeout=300 p2=0 p3=0 obj#=-1 tim=1387372352479352 ! *** 2013-12-18 14:12:35.479 WAIT #0: nam='rdbms ipc message' ela= 3000075 timeout=300 p2=0 p3=0 obj#=-1 tim=1387372355479531 ! *** 2013-12-18 14:12:38.479 WAIT #0: nam='rdbms ipc message' ela= 2999755 timeout=300 p2=0 p3=0 obj#=-1 tim=1387372358479381 ! *** 2013-12-18 14:12:41.479 WAIT #0: nam='rdbms ipc message' ela= 3000021 timeout=300 p2=0 p3=0 obj#=-1 tim=1387372361479499 7
  • 8. Logwriter,  idle • “rdbms  ipc  message”  indicates  a  sleep/idle  event   – There  isn’t  an  indicaon  lgwr  writes  something:   ! semtimedop(327683, {{15, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable) getrusage(RUSAGE_SELF, {ru_utime={0, 84000}, ru_stime={0, 700000}, ...}) = 0 getrusage(RUSAGE_SELF, {ru_utime={0, 84000}, ru_stime={0, 700000}, ...}) = 0 times({tms_utime=8, tms_stime=70, tms_cutime=0, tms_cstime=0}) = 431286151 times({tms_utime=8, tms_stime=70, tms_cutime=0, tms_cstime=0}) = 431286151 times({tms_utime=8, tms_stime=70, tms_cutime=0, tms_cstime=0}) = 431286151 times({tms_utime=8, tms_stime=70, tms_cutime=0, tms_cstime=0}) = 431286151 semtimedop(327683, {{15, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable) …etc… 8
  • 9. Logwriter,  idle • It  does  look  in  the  /proc  filesystem  to  the  ‘stat’   file  of  a  certain  process:   ! open("/proc/2218/stat", O_RDONLY) = 21 read(21, "2218 (oracle) S 1 2218 2218 0 -1"..., 999) = 209 close(21) ! • It  does  so  every  20th  me  (20*3)=  60  sec.   • The  PID  is  PMON.   9
  • 10. Logwriter,  idle • Recap:   – In  an  idle  database.   – The  lgwr  sleeps  on  a  semaphore  for  3  seconds.   • Then  wakes  up,  and  sets  up  the  semaphore/sleep  again.   • Processes  sleeping  on  a  semaphore  do  not  spend  CPU   – Every  minute,  lgwr  reads  pmon's  process  stats.   – lgwr  doesn’t  write  if  there’s  no  need.   • But  what  happens  when  we  insert  a  row  of  data,   and  commit  that? 10
  • 11. Logwriter,  commit TS@//localhost/v11204 > insert into t values ( 1, 'aaaa', 'bbbb' ); ! 1 row created. ! TS@//localhost/v11204 > commit; ! Commit complete. 11
  • 12. Logwriter,  commit  -­‐  expected 12 time foreground logwriter semtimedop(458755, {{15, -1, 0}}, 1, {3, 0}) semctl(458755, 15, SETVAL, 0x7fff00000001) semctl(458755, 33, SETVAL, 0x1) semtimedop(458755, {{33, -1, 0}}, 1, {0, 100000000}) commit; io_submit(139981752844288, 1, {{0x7f5008e23480, 0, 1, 0, 256}}) io_getevents(139981752844288, 1, 128, {{0x7f5008e23480, 0x7f5008e23480, 3584, 0}}, {600, 0})
  • 13. Logwriter,  commit  -­‐  actual 13 time foreground logwriter semtimedop(458755, {{15, -1, 0}}, 1, {3, 0}) semctl(458755, 15, SETVAL, 0x7fff00000001) commit; io_submit(139981752844288, 1, {{0x7f5008e23480, 0, 1, 0, 256}}) io_getevents(139981752844288, 1, 128, {{0x7f5008e23480, 0x7f5008e23480, 3584, 0}}, {600, 0}) no ‘log file sync’ wait! No semctl() No semtimedop()
  • 14. Logwriter,  commit • Invesgaon  shows:   – Foreground  scans  log  writer  progress  up  to  3  mes.   • kcrf_commit_force()  >  kcscur3()     ! – If  its  data*  in  the  redo  log  buffer  is  not  wriHen:   • It  nofies  the  lgwr  that  it  is  going  to  sleep  on  a  semaphore.   • semmedop()  for  100ms,  unl  posted  by  lgwr.   – If  its  data*  has  been  wriHen:   • No  need  to  wait  on  it.   • No  ‘log  file  sync’  wait. 14
  • 15. Logwriter,  commit • Wait!!!   – This  (no  log  file  sync)  turned  out  to  be  an  edge  case.   • I  traced  the  kcrf_commit_force()  and  kcscur3()  calls  using   breaks  in  gdb.   ! – In  normal  situaBons,  the  wait  will  appear.   • Depending  on  log  writer  and  FG  progress.   • The  semBmedop()  call  in  the  FG  can  be  absent.   – As  a  result,  lgwr  will  not  semctl()   15
  • 16. Logwriter,  commit  -­‐  post-­‐wait 16 time foreground logwriter semtimedop(458755, {{15, -1, 0}}, 1, {3, 0}) semctl(458755, 15, SETVAL, 0x7fff00000001) commit; io_submit(139981752844288, 1, {{0x7f5008e23480, 0, 1, 0, 256}}) io_getevents(139981752844288, 1, 128, {{0x7f5008e23480, 0x7f5008e23480, 3584, 0}}, {600, 0}) log file parallel writerdbms ipc message kcrf_commit_force() kcscur3() semtimedop(458755, {{33, -1, 0}}, 1, {0, 100000000}) log file sync Grayed means: optional
  • 17. adapBve  log  file  sync • Feature  of  Oracle  11.2   – Parameter  '_use_adapBve_log_file_sync'   • Set  to  FALSE  up  to  11.2.0.2   • Set  to  TRUE  starBng  from  11.2.0.3   • Third  value  ‘POLLING_ONLY’   – Makes  Oracle  adapBvely  switch  between  ‘post-­‐wait’   and  polling.   – The  log  writer  writes  a  noBficaBon  in  its  logfile  if  it   switches  between  modes  (if  param  =  ‘TRUE’) 17
  • 18. Logwriter,  commit  -­‐  polling 18 time foreground logwriter semtimedop(458755, {{15, -1, 0}}, 1, {3, 0}) semctl(458755, 15, SETVAL, 0x7fff00000001) commit; io_submit(139981752844288, 1, {{0x7f5008e23480, 0, 1, 0, 256}}) io_getevents(139981752844288, 1, 128, {{0x7f5008e23480, 0x7f5008e23480, 3584, 0}}, {600, 0}) log file parallel writerdbms ipc message kcrf_commit_force kcscur3 log file sync nanosleep({0, 9409000}, 0x7fff64725480) nanosecs; varies
  • 19. Logwriter,  post-­‐wait  vs.  polling – No  wait  event  ‘log  file  sync’  if:   • Lgwr  was  able  to  flush  the  commiaed  data  before  the   foreground  has  issued  kcscur3()  2/3  Bmes  in   kcrf_commit_force().   – If  not,  the  foreground  starts  a  ‘log  file  sync’  wait.   • If  in  “post-­‐wait”  mode  (default),  it  will  record  it’s  waiBng   state  in  the  post-­‐wait  queue,  sleep  in  semBmedop()  for   100ms  at  a  Bme,  waiBng  to  be  posted  by  lgwr.   • If  in  “polling”  mode,  it  will  sleep  in  nanosleep()  for   computed  Bme*,  then  check  lgwr  progress,    if  lgwr  write   has  progressed  beyond  its  commiaed  data  SCN:  end  wait,   else  start  sleeping  in  nanosleep()  again. 19
  • 20. Logwriter • The  main  task  of  lgwr  is  to  flush  data  in  the   logbuffer  to  disk.   – The  lgwr  is  idle  when  waiBng  on  ‘rdbms  ipc  message’.       – There  are  two  main*  indicators  of  lgwr  business:   • CPU  Bme.   • Wait  event  ‘log  file  parallel  write’.   ! ! • The  lgwr  needs  to  be  able  to  get  onto  the  CPU  in   order  to  do  process! 20
  • 21. Logwriter  -­‐  idle 21 logwriter semtimedop(458755, {{15, -1, 0}}, 1, {3, 0}) rdbms ipc message rdbms ipc message rdbms ipc message
  • 22. Logwriter  -­‐  idle 22 logwriter rdbms ipc message rdbms ipc message
  • 23. Logwriter  -­‐  idle 23 gwriter Idle mode latch gets: ‘messages’ ‘mostly latch-free SCN’ ‘lgwr LWN SCN’ ‘KTF sga latch’ ‘redo allocation’ ‘messages’
  • 24. Logwriter  -­‐  wriBng 24 gwriter Write mode latch gets and frees: ‘messages’ ‘mostly latch-free SCN’ ‘lgwr LWN SCN’ ‘KTF sga latch’ ‘redo allocation’ ‘messages’ ‘redo writing’
  • 25. Logwriter  -­‐  wriBng 25 gwriter io_submit(139981752844288, n, {{0x7f5008e23480, 0, 1, 0, 256}}) log file parallel write io_getevents(139981752844288, n, 128, {{0x7f5008e23480, 0x7f5008e23480, 3584, 0}}, {0, 0}) io_getevents(139981752844288, n, 128, {{0x7f5008e23480, 0x7f5008e23480, 3584, 0}}, {600, 0}) With the linux ‘strace’ utility, the non-blocking syscall is visible OR the blocking one syscall is visible.
  • 26. • The  event  ‘log  file  parallel  write’  is  an  indicator   of  IO  wait  Bme  for  the  lgwr.   – NOT  IO  latency  Bme!! Logwriter  -­‐  wriBng 26 io_submit() io_getevents() io_getevents() real IO time: 10ms real IO time: 7ms log file parallel write: 9.85ms
  • 27. Logwriter  wait  events • rdbms  ipc  message   – Bmeout:  300  (cenBseconds;  3  seconds).   – process  sleeping  ~  3  seconds  on  semaphore.   ! • log  file  parallel  write   – files:  number  of  log  file  members.   – blocks:  total  number  of  log  blocks  wriaen.   – requests:  ?   • I’ve  seen  this  differ  from  the  actual  numer  of  IO  requests. 27
  • 28. Logwriter  wait  events • Let’s  switch  the  database  to  synchronous  IO.   – Some  planorms  have  difficulty  with  AIO  (HPUX!)   – Got  to  check  if  your  config  does  use  AIO.   • Found  out  by  accident  that  ASM+NFS  has  no  AIO  by  default.   ! – Good  to  understand  what  the  absence  of  AIO  means.   ! • If  you  can’t  use  AIO  today,  you  are  doing  it   WRONG! 28
  • 29. log  file  parallel  write 29 disk_asynch_io=FALSE  (no  AIO)   ! kslwtbctx semtimedop - 458755 semid, timeout: $62 = {tv_sec = 3, tv_nsec = 0} kslwtectx -- Previous wait time: 584208: rdbms ipc message ! pwrite64 - fd, size - 256,512 pwrite64 - fd, size - 256,512 ! kslwtbctx kslwtectx -- Previous wait time: 782: log file parallel write ! kslwtbctx semtimedop - 458755 semid, timeout: $63 = {tv_sec = 2, tv_nsec = 310000000} kslwtectx -- Previous wait time: 2315982: rdbms ipc message timeout = 3s wait time = 0.584208s => semaphore is posted! two sequential writes (two logfiles) the wait begins AFTER the write (!) it’s also suspiciously fast (0.8ms)
  • 30. log  file  parallel  write 30 disk_asynch_io=FALSE  (no  AIO)     Let’s  add  100ms  to  the  IOs  (shell  sleep  0.1)   ! kslwtbctx semtimedop - 458755 semid, timeout: $3 = {tv_sec = 2, tv_nsec = 900000000} kslwtectx -- Previous wait time: 2905568: rdbms ipc message ! pwrite64 - fd, size - 256,512 >sleep 0.1 pwrite64 - fd, size - 256,512 >sleep 0.1 ! kslwtbctx kslwtectx -- Previous wait time: 545: log file parallel write Two writes again. In the break a sleep of 100ms is added. This should make the timing at least 200’000 The timing is 545 (0.5ms): timing is off.
  • 31. log  file  parallel  write • Conclusion:   – For  at  least  Oracle  version  11.2.   – When  synchronous  IO  (pwrite64())  is  issued.   • disk_asynch_io  =  FALSE  (ASM)   • filesystemio_opBons  !=  “setall”  or  “asynch”   – The  wait  event  does  not  Bme  the  IO  requests.   ! • How  about  the  other  log  writer  wait  events? 31
  • 32. control  file  sequenBal  read 32 disk_asynch_io=FALSE  (no  AIO)   ! kslwtbctx pread64 - fd, size - 256,16384 >sleep 0.1 kslwtectx -- Previous wait time: 100323: control file sequential read ! This  event  is  correctly  Bmed.  
  • 33. control  file  parallel  write 33 disk_asynch_io=FALSE  (no  AIO)   ! pwrite64 - fd, size - 256,16384 >sleep 0.1 kslwtbctx kslwtectx -- Previous wait time: 705: control file parallel write ! This  event  is  incorrectly  Bmed!  
  • 34. log  file  single  write 34 disk_asynch_io=FALSE  (no  AIO)   ! kslwtbctx pwrite64 - fd, size - 256,512 >sleep 0.1 kslwtectx -- Previous wait time: 104594: log file single write ! This  event  is  correctly  Bmed.  
  • 35. Logwriter  wait  events  logswitch • Some  of  these  waits  typically  show  up  during  a   logswitch.   – This  are  all  the  waits  which  are  normally  seen:   • os  thread  startup  (semctl()-­‐semBmedop())   • control  file  sequenBal  read  (pread64())   • control  file  parallel  write  (io_submit()-­‐io_getevents())   • log  file  sequenBal  read  (pread64())   • log  file  single  write  (pwrite64())   • KSV  master  wait  (semctl()  post  to  dbwr)   ! • This  is  with  AIO  enabled! 35
  • 36. Logwriter,  Bmeout  message • Warning:   ! Warning: log write elapsed time 523ms, size 2760KB ! • Printed  in  logwriter  tracefile  (NOT  alert.log)   • Threshold  set  with  parameter:   – _side_channel_batch_Bmeout_ms  (500ms) 36
  • 37. Logwriter:  disable  logging • The  “forbidden  switch”:  _disable_logging   – Do  not  use  this  for  anything  else  than  tests!   ! • Everything  is  done  the  same  —  no  magic   – Except  the  write  by  the  lgwr  to  the  logfiles   – No  ‘log  file  parallel  write’   – Redo/control/data  files  are  synced  with  shut  normal   ! • A  way  to  test  if  lgwr  IO  influences  db  processing 37
  • 38. Logwriter:  exadata • How  does  this  look  like  on  Exadata? 38
  • 39. Logwriter:  exadata kslwtbctx semtimedop - 3309577 semid, timeout: $24 = {tv_sec = 2, tv_nsec = 970000000} kslwtectx -- Previous wait time: 2973630: rdbms ipc message ! $25 = "oss_write" $26 = "oss_write" $27 = "oss_write" $28 = "oss_write" ! kslwtbctx $29 = "oss_wait" $30 = "oss_wait" $31 = "oss_wait" $32 = "oss_wait" kslwtectx -- Previous wait time: 2956: log file parallel write ! kslwtbctx semtimedop - 3309577 semid, timeout: $33 = {tv_sec = 3, tv_nsec = 0} kslwtectx -- Previous wait time: 3004075: rdbms ipc message 39 The dbwr semaphore sleep. The writes are issued here. There is no io_submit like wait. This is not timed. The wait is log file parallel write, identical to non-exadata. It seems to wait for all issued IOs
  • 40. Database  writer • From  the  Oracle  11.2  concepts  guide:   – The  DBWn  process  writes  dirty  buffers  to  disk  under   the  following  condiBons:   • When  a  server  process  cannot  find  a  clean  reusable  buffer   axer  scanning  a  threshold  of  buffers,  it  signals  DBWn  to   write.  DBWn  writes  dirty  buffers  to  disk  asynchronously  if   possible  while  performing  other  processing.   • DBWn  periodically  writes  buffers  to  advance  the   checkpoint,  which  is  the  posiBon  in  the  redo  thread  from   which  instance  recovery  begins.  The  log  posiBon  of  the   checkpoint  is  determined  by  the  oldest  dirty  buffer  in  the   buffer  cache. 40
  • 41. Database  writer,  idle • The  10046/8  trace  shows:   ! *** 2013-12-31 00:45:51.088 WAIT #0: nam='rdbms ipc message' ela= 3006219 timeout=300 p2=0 p3=0 obj#=-1 tim=1388447151086891 ! *** 2013-12-31 00:45:54.142 WAIT #0: nam='rdbms ipc message' ela= 3005237 timeout=300 p2=0 p3=0 obj#=-1 tim=1388447154140873 ! *** 2013-12-31 00:45:57.197 WAIT #0: nam='rdbms ipc message' ela= 3005258 timeout=300 p2=0 p3=0 obj#=-1 tim=1388447157195828 ! *** 2013-12-31 00:46:00.255 WAIT #0: nam='rdbms ipc message' ela= 3005716 timeout=300 p2=0 p3=0 obj#=-1 tim=1388447160253960 41
  • 42. Database  writer,  idle • “rdbms  ipc  message”  indicates  a  sleep/idle  event   – There  isn’t  an  indicaBon  dbw0  writes  something:   ! semtimedop(983043, {{14, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable) getrusage(RUSAGE_SELF, {ru_utime={0, 31000}, ru_stime={0, 89000}, ...}) = 0 getrusage(RUSAGE_SELF, {ru_utime={0, 31000}, ru_stime={0, 89000}, ...}) = 0 times({tms_utime=3, tms_stime=8, tms_cutime=0, tms_cstime=0}) = 431915044 times({tms_utime=3, tms_stime=8, tms_cutime=0, tms_cstime=0}) = 431915044 times({tms_utime=3, tms_stime=8, tms_cutime=0, tms_cstime=0}) = 431915044 semtimedop(983043, {{14, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable) …etc… 42
  • 43. Database  writer,  idle • It  does  look  in  the  /proc  filesystem  to  the  ‘stat’   file  of  a  certain  process:   ! open("/proc/2218/stat", O_RDONLY) = 21 read(21, "2218 (oracle) S 1 2218 2218 0 -1"..., 999) = 209 close(21) ! • It  does  so  every  20th  Bme  (20*3)=  60  sec.   • The  PID  is  PMON.   43
  • 44. Database  writer,  idle • Recap:   – In  an  idle  database.   – The  dbwr  sleeps  on  a  semaphore  for  3  seconds.   • Then  wakes  up,  and  sets  up  the  semaphore/sleep  again.   • Processes  sleeping  on  a  semaphore  do  not  spend  CPU   – Every  minute,  dbwr  reads  pmon's  process  stats.   – dbwr  doesn’t  write  if  there’s  no  need. 44
  • 45. Database  writer,  force  write • We  can  force  the  dbwr  to  write:   – Dirty  some  blocks  (insert  a  row  into  a  table).   – Force  a  thread  checkpoint  (alter  system  checkpoint).   ! ! ! ! ! *  There  are  mulBple  ways,  this  is  one  of  them. 45
  • 46. Database  writer,  force  write 10046/8  trace:   *** 2014-01-03 03:37:47.473 WAIT #0: nam='rdbms ipc message' ela= 3000957 timeout=300 p2=0 p3=0 obj#=-1 tim=1388716667473024 ! *** 2014-01-03 03:37:49.735 WAIT #0: nam='rdbms ipc message' ela= 2261867 timeout=300 p2=0 p3=0 obj#=-1 tim=1388716669735046 WAIT #0: nam='db file async I/O submit' ela= 0 requests=3 interrupt=0 timeout=0 obj#=-1 tim=1388716669735493 WAIT #0: nam='db file parallel write' ela= 21 requests=1 interrupt=0 timeout=2147483647 obj#=-1 tim=1388716669735566 ! *** 2014-01-03 03:37:50.465 WAIT #0: nam='rdbms ipc message' ela= 729110 timeout=73 p2=0 p3=0 obj#=-1 tim=1388716670464967 46 elapsed time = 2.26 sec. So the dbwr is posted! db file async I/O submit?! It looks like the io_submit() call is instrumented for the dbwr! But what does ‘requests=3’ mean for a single row update checkpoint? And the write, via the event ‘db file parallel write’.
  • 47. dbwr,  sql_trace  +  strace • Let’s  take  a  look  at  the  Oracle  wait  events,   together  with  the  actual  system  calls.   ! • That  is:   – Seyng  a  10046/8  event  for  trace  and  waits.   – Execute  strace  with  ‘-­‐e  write=all  -­‐e  all’ 47
  • 48. dbwr,  sql_trace  +  strace io_submit(140195085938688, 3, {{0x7f81b622ab10, 0, 1, 0, 256}, {0x7f81b622a8a0, 0, 1, 0, 256}, {0x7f81b622a630, 0, 1, 0, 256}}) = 3 write(13, "WAIT #0: nam='db file async I/O "..., 108) = 108 | 00000 57 41 49 54 20 23 30 3a 20 6e 61 6d 3d 27 64 62 WAIT #0: nam='db | | 00010 20 66 69 6c 65 20 61 73 79 6e 63 20 49 2f 4f 20 file as ync I/O | | 00020 73 75 62 6d 69 74 27 20 65 6c 61 3d 20 31 20 72 submit' ela= 1 r | | 00030 65 71 75 65 73 74 73 3d 33 20 69 6e 74 65 72 72 equests= 3 interr | | 00040 75 70 74 3d 30 20 74 69 6d 65 6f 75 74 3d 30 20 upt=0 ti meout=0 | | 00050 6f 62 6a 23 3d 2d 31 20 74 69 6d 3d 31 33 38 38 obj#=-1 tim=1388 | | 00060 39 37 37 36 35 31 38 30 34 32 36 31 97765180 4261 | io_getevents(140195085938688, 1, 128, {{0x7f81b622ab10, 0x7f81b622ab10, 8192, 0}, {0x7f81b622a8a0, 0x7f81b622a8a0, 8192, 0}, {0x7f81b622a630, 0x7f81b622a630, 8192, 0}}, {600, 0}) = 3 write(13, "WAIT #0: nam='db file parallel w"..., 116) = 116 | 00000 57 41 49 54 20 23 30 3a 20 6e 61 6d 3d 27 64 62 WAIT #0: nam='db | | 00010 20 66 69 6c 65 20 70 61 72 61 6c 6c 65 6c 20 77 file pa rallel w | | 00020 72 69 74 65 27 20 65 6c 61 3d 20 35 38 20 72 65 rite' el a= 58 re | | 00030 71 75 65 73 74 73 3d 31 20 69 6e 74 65 72 72 75 quests=1 interru | | 00040 70 74 3d 30 20 74 69 6d 65 6f 75 74 3d 32 31 34 pt=0 tim eout=214 | | 00050 37 34 38 33 36 34 37 20 6f 62 6a 23 3d 2d 31 20 7483647 obj#=-1 | | 00060 74 69 6d 3d 31 33 38 38 39 37 37 36 35 31 38 30 tim=1388 97765180 | | 00070 34 35 37 39 4579 | 48
  • 49. dbwr,  sql_trace  +  strace io_submit(140195085938688, 3, {{0x7f81b622ab10, 0, 1, 0, 256}, {0x7f81b622a8a0, 0, 1, 0, 256}, {0x7f81b622a630, 0, 1, 0, 256}}) = 3 write(13, "WAIT #0: nam='db file async I/O "..., 108) = 108 | 00000 57 41 49 54 20 23 30 3a 20 6e 61 6d 3d 27 64 62 WAIT #0: nam='db | | 00010 20 66 69 6c 65 20 61 73 79 6e 63 20 49 2f 4f 20 file as ync I/O | | 00020 73 75 62 6d 69 74 27 20 65 6c 61 3d 20 31 20 72 submit' ela= 1 r | | 00030 65 71 75 65 73 74 73 3d 33 20 69 6e 74 65 72 72 equests= 3 interr | | 00040 75 70 74 3d 30 20 74 69 6d 65 6f 75 74 3d 30 20 upt=0 ti meout=0 | | 00050 6f 62 6a 23 3d 2d 31 20 74 69 6d 3d 31 33 38 38 obj#=-1 tim=1388 | | 00060 39 37 37 36 35 31 38 30 34 32 36 31 97765180 4261 | io_getevents(140195085938688, 1, 128, {{0x7f81b622ab10, 0x7f81b622ab10, 8192, 0}, {0x7f81b622a8a0, 0x7f81b622a8a0, 8192, 0}, {0x7f81b622a630, 0x7f81b622a630, 8192, 0}}, {600, 0}) = 3 write(13, "WAIT #0: nam='db file parallel w"..., 116) = 116 | 00000 57 41 49 54 20 23 30 3a 20 6e 61 6d 3d 27 64 62 WAIT #0: nam='db | | 00010 20 66 69 6c 65 20 70 61 72 61 6c 6c 65 6c 20 77 file pa rallel w | | 00020 72 69 74 65 27 20 65 6c 61 3d 20 35 38 20 72 65 rite' el a= 58 re | | 00030 71 75 65 73 74 73 3d 31 20 69 6e 74 65 72 72 75 quests=1 interru | | 00040 70 74 3d 30 20 74 69 6d 65 6f 75 74 3d 32 31 34 pt=0 tim eout=214 | | 00050 37 34 38 33 36 34 37 20 6f 62 6a 23 3d 2d 31 20 7483647 obj#=-1 | | 00060 74 69 6d 3d 31 33 38 38 39 37 37 36 35 31 38 30 tim=1388 97765180 | | 00070 34 35 37 39 4579 | 49
  • 50. dbwr,  sql_trace  +  strace io_submit(140195085938688, 3, {{0x7f81b622ab10, 0, 1, 0, 256}, {0x7f81b622a8a0, 0, 1, 0, 256}, {0x7f81b622a630, 0, 1, 0, 256}}) = 3 write(13, "WAIT #0: nam='db file async I/O "..., 108) = 108 | 00000 57 41 49 54 20 23 30 3a 20 6e 61 6d 3d 27 64 62 WAIT #0: nam='db | | 00010 20 66 69 6c 65 20 61 73 79 6e 63 20 49 2f 4f 20 file as ync I/O | | 00020 73 75 62 6d 69 74 27 20 65 6c 61 3d 20 31 20 72 submit' ela= 1 r | | 00030 65 71 75 65 73 74 73 3d 33 20 69 6e 74 65 72 72 equests= 3 interr | | 00040 75 70 74 3d 30 20 74 69 6d 65 6f 75 74 3d 30 20 upt=0 ti meout=0 | | 00050 6f 62 6a 23 3d 2d 31 20 74 69 6d 3d 31 33 38 38 obj#=-1 tim=1388 | | 00060 39 37 37 36 35 31 38 30 34 32 36 31 97765180 4261 | io_getevents(140195085938688, 1, 128, {{0x7f81b622ab10, 0x7f81b622ab10, 8192, 0}, {0x7f81b622a8a0, 0x7f81b622a8a0, 8192, 0}, {0x7f81b622a630, 0x7f81b622a630, 8192, 0}}, {600, 0}) = 3 write(13, "WAIT #0: nam='db file parallel w"..., 116) = 116 | 00000 57 41 49 54 20 23 30 3a 20 6e 61 6d 3d 27 64 62 WAIT #0: nam='db | | 00010 20 66 69 6c 65 20 70 61 72 61 6c 6c 65 6c 20 77 file pa rallel w | | 00020 72 69 74 65 27 20 65 6c 61 3d 20 35 38 20 72 65 rite' el a= 58 re | | 00030 71 75 65 73 74 73 3d 31 20 69 6e 74 65 72 72 75 quests=1 interru | | 00040 70 74 3d 30 20 74 69 6d 65 6f 75 74 3d 32 31 34 pt=0 tim eout=214 | | 00050 37 34 38 33 36 34 37 20 6f 62 6a 23 3d 2d 31 20 7483647 obj#=-1 | | 00060 74 69 6d 3d 31 33 38 38 39 37 37 36 35 31 38 30 tim=1388 97765180 | | 00070 34 35 37 39 4579 | 50 This is the MINIMAL number of requests to reap before successful. (min_nr - see man io_getevents) ? ? ? The timeout for io_getevents() is set to 600 seconds. struct timespec { sec, nsec } Despite only needing 1 request, this call returned all 3. This information is NOT EXTERNALISED (!!)
  • 51. dbwr,  db  file  async  I/O  submit • Let’s  take  a  look  at  the  what  the  documentaBon   says  about  “db  file  async  I/O  submit”:   ! 51 (That’s  right…nothing)     !
  • 52. dbwr,  db  file  async  I/O  submit • My  Oracle  Support  on  “db  file  async  I/O  submit”:   ! 'db file async I/O submit' when FILESYSTEMIO_OPTIONS=NONE [Article ID 1274737.1] How To Address High Wait Times for the 'Direct Path Write Temp ' Wait Event [Article ID 1576956.1] ! • Both  don’t  describe  what  this  event  is.   • 1st  note  is  only  for  filesystemio_opBons=NONE   and  describes  the  event  not  being  tracked  prior   to  version  11.2.0.2. 52
  • 53. dbwr,  db  file  async  I/O  submit • So  the  quesBon  is:   ! – What  DOES  the  event  “db  file  async  I/O  submit”  mean?   ! 53 • The  obvious  answer  is:   – InstrumentaBon  of  the  io_submit()  call.   • My  answer  is:   – Don’t  know.   – But  NOT  the  instrumentaBon  of  io_submit().  
  • 54. dbwr,  db  file  async  I/O  submit • This  is  a  trace  of  the  relevant  C  funcBons:   ! kslwtbctx kslwtectx -- Previous wait time: 236317: rdbms ipc message ! io_submit - 3,45e5a000 - nr,ctx kslwtbctx kslwtectx -- Previous wait time: 688: db file async I/O submit ! kslwtbctx io_getevents - 1,45e5a000 - minnr,ctx,timeout: $3 = {tv_sec = 600, tv_nsec = 0} skgfr_return64 - 3 IOs returned kslwtectx -- Previous wait time: 9604: db file parallel write 54 Waiting on a semaphore to be posted. io_submit() for 3 IOs The begin of the wait starts AFTER the io_submit()? io_getevevents() is properly timed. min_nr=1, got 3 IOs
  • 55. dbwr,  db  file  async  I/O  submit • Trace  with  sleep  0.1  in  the  break  on  io_submit()   ! kslwtbctx kslwtectx -- Previous wait time: 385794: rdbms ipc message ! io_submit - 3,45e5a000 - nr,ctx > sleep 0.1 kslwtbctx kslwtectx -- Previous wait time: 428: db file async I/O submit ! kslwtbctx io_getevents - 1,45e5a000 - minnr,ctx,timeout: $37 = {tv_sec = 600, tv_nsec = 0} skgfr_return64 - 3 IOs returned kslwtectx -- Previous wait time: 8053: db file parallel write 55 Waiting on a semaphore to be posted. io_submit() for 3 IOs + sleep of 100’000 Wait time too low. io_submit() is not timed.
  • 56. dbwr,  db  file  parallel  write • Let’s  look  at  the  “db  file  parallel  write”  event.   56
  • 57. dbwr,  db  file  parallel  write • DescripBon  from  the  Reference  Guide:   ! db file parallel write This event occurs in the DBWR. It indicates that the DBWR is performing a parallel write to files and blocks. When the last I/O has gone to disk, the wait ends. Wait Time: Wait until all of the I/Os are completed Parameter Description requests: This indicates the total number of I/O requests, which will be the same as blocks interrupt: timeout: This indicates the timeout value in hundredths of a second to wait for the I/O completion. 57 Correct Correct…but only if AIO is enabled. Incorrect Incorrect Incorrect Empty? Probably incorrect. Or does a timeout of 2’147’483’647 /100/60/60/24= 248.55 days Make sense to *anybody*?
  • 58. dbwr,  db  file  parallel  write • Recap  of  previous  traced  calls:   ! kslwtbctx kslwtectx -- Previous wait time: 236317: rdbms ipc message ! io_submit - 3,45e5a000 - nr,ctx kslwtbctx kslwtectx -- Previous wait time: 688: db file async I/O submit ! kslwtbctx io_getevents - 1,45e5a000 - minnr,ctx,timeout: $3 = {tv_sec = 600, tv_nsec = 0} skgfr_return64 - 3 IOs returned kslwtectx -- Previous wait time: 9604: db file parallel write 58 So….how about severely limiting OS IO capacity and see what happens?
  • 59. dbwr,  db  file  parallel  write • Database  writer  —  severely  limited  IO  (  1  IOPS)   io_submit - 366,45e5a000 - nr,ctx kslwtbctx kslwtectx -- Previous wait time: 1070: db file async I/O submit ! kslwtbctx io_getevents - 100,45e5a000 - minnr,ctx,timeout: $7 = {tv_sec = 600, tv_nsec = 0} skgfr_return64 - 100 IOs returned kslwtectx -- Previous wait time: 109334845: db file parallel write ! io_getevents - 128,45e5a000 - minnr,ctx,timeout: $8 = {tv_sec = 0, tv_nsec = 0} io_getevents - 128,45e5a000 - minnr,ctx,timeout: $9 = {tv_sec = 0, tv_nsec = 0} ! io_submit - 73,45e5a000 - nr,ctx kslwtbctx kslwtectx -- Previous wait time: 486: db file async I/O submit 59 366 IO requests are submitted onto the OS. But only 100 IOs are needed to satisfy io_getevents() Which it does in this case… leaving outstanding IOs The dbwr starts issuing non-blocking calls to reap IOs! It seems to be always 2 if outstanding IOs remain. Minnr = # outstanding IOs, max 128.
  • 60. dbwr,  db  file  parallel  write • This  got  me  thinking…   • The  dbwr  submits  the  IOs  it  needs  to  write.   ! • But  it  waits  for  a  variable  amount  of  IOs  to  finish.   – Wait  event  ‘db  file  parallel  write’.   – Amount  seems  33-­‐25%  of  submiaed  IOs*   – Axer  that,  2  tries  to  reap  the  remaining  IOs*   – Then  either  submit  again,  DFPW  unBl  IOs  reaped  or   back  to  sleeping  on  semaphore.   60
  • 61. dbwr,  db  file  parallel  write • This  means  ‘db  file  parallel  write’  is  not:   – Physical  IO  indicator.   – IO  latency  Bming   ! • I’ve  come  to  the  conclusion  that  the  blocking   io_getevents  call  for  a  number  of  IOs  of  the  dbwr   is  an  IO  limiter.   • …and  ’db  file  parallel  write’  is  the  Bming  of  it. 61
  • 62. dbwr,  synchronous  IO • Let’s  turn  AIO  off  again.   – To  simulate  this,  I’ve  set  disk_asynch_io  to  FALSE.   ! • And  set  a  10046/8  trace  and  strace  on  the  dbwr.   • And  issue  the  SQLs  as  before:   – insert  into  followed  by  commit   – alter  system  checkpoint 62
  • 63. dbwr,  synchronous  IO pwrite(256, "62420020726100hG1700020063R001000009U10[G170"..., 8192, 8053121024) = 8192 pwrite(256, "22420024603000hG1700014220T0030r0j 30020301717"..., 8192, 7443103744) = 8192 pwrite(256, "&2420024003000hG 170002437221600000000000000"..., 8192, 7443054592) = 8192 write(11, "WAIT #0: nam='db file parallel w"..., 107) = 107 | 00000 57 41 49 54 20 23 30 3a 20 6e 61 6d 3d 27 64 62 WAIT #0: nam='db | | 00010 20 66 69 6c 65 20 70 61 72 61 6c 6c 65 6c 20 77 file pa rallel w | | 00020 72 69 74 65 27 20 65 6c 61 3d 20 32 30 20 72 65 rite' el a= 20 re | | 00030 71 75 65 73 74 73 3d 33 20 69 6e 74 65 72 72 75 quests=3 interru | | 00040 70 74 3d 30 20 74 69 6d 65 6f 75 74 3d 30 20 6f pt=0 tim eout=0 o | | 00050 62 6a 23 3d 2d 31 20 74 69 6d 3d 31 33 38 39 38 bj#=-1 t im=13898 | | 00060 30 32 32 38 31 34 35 36 37 34 30 02281456 740 | write(11, "WAIT #0: nam='db file parallel w"..., 106) = 106 | 00000 57 41 49 54 20 23 30 3a 20 6e 61 6d 3d 27 64 62 WAIT #0: nam='db | | 00010 20 66 69 6c 65 20 70 61 72 61 6c 6c 65 6c 20 77 file pa rallel w | | 00020 72 69 74 65 27 20 65 6c 61 3d 20 31 20 72 65 71 rite' el a= 1 req | | 00030 75 65 73 74 73 3d 33 20 69 6e 74 65 72 72 75 70 uests=3 interrup | | 00040 74 3d 30 20 74 69 6d 65 6f 75 74 3d 30 20 6f 62 t=0 time out=0 ob | | 00050 6a 23 3d 2d 31 20 74 69 6d 3d 31 33 38 39 38 30 j#=-1 ti m=138980 | | 00060 32 32 38 31 34 35 38 34 39 32 22814584 92 | 63 3 pwrite() calls. This is synchronous IO! The db file parallel write wait event shows 3 requests! But why a second db file parallel write wait event?
  • 64. dbwr,  synchronous  IO • There’s  no  ‘db  file  async  I/O  submit’  wait   anymore.   – Which  is  good,  because  SIO  has  no  submit  phase.   • The  ‘db  file  parallel  write’  waits  seem  suspicious.   – It  seems  like  the  wait  for  DFPW  is  issued  twice.   – Further  invesBgaBon  shows  that  it  does.   • My  guess  this  is  a  bug  in  the  sync.  IO  implementaBon.   • Let’s  look  a  level  deeper  and  see  if  there’s  more   to  see. 64
  • 65. dbwr,  synchronous  IO kslwtbctx semtimedop - 458755 semid, timeout: $18 = {tv_sec = 3, tv_nsec = 0} kslwtectx -- Previous wait time: 1239214: rdbms ipc message ! pwrite64 - fd, size - 256,8192 pwrite64 - fd, size - 256,8192 pwrite64 - fd, size - 256,8192 ! kslwtbctx kslwtectx -- Previous wait time: 949: db file parallel write ! kslwtbctx kslwtectx -- Previous wait time: 650: db file parallel write ! kslwtbctx semtimedop - 458755 semid, timeout: $19 = {tv_sec = 1, tv_nsec = 620000000} 65 This is clearly the semaphore being posted: timeout=3s, wait time = 1239,2ms 3 IO’s in serial using pwrite(). The only possibility if there isn’t AIO of course. Two db file parallel write (which aren’t parallel) for which both the begin of the waits are started AFTER the IO (!!) After the IOs are done, the dbwr continues sleeping.
  • 66. dbwr,  synchronous  IO • Let’s  do  the  same  trick  as  done  earlier:   – In  gdb,  add  “shell  sleep  0.1”  to  the  pwrite  call.   – This  makes  the  execuBon  of  this  call  take  100ms  longer.   – To  see  if  there’s  sBll  some  way  Oracle  Bmes  it  properly. 66
  • 67. dbwr,  synchronous  IO kslwtbctx semtimedop - 458755 semid, timeout: $23 = {tv_sec = 3, tv_nsec = 0} kslwtectx -- Previous wait time: 92080: rdbms ipc message ! pwrite64 - fd, size - 256,8192 > shell sleep 0.1 pwrite64 - fd, size - 256,8192 > shell sleep 0.1 pwrite64 - fd, size - 256,8192 > shell sleep 0.1 ! kslwtbctx kslwtectx -- Previous wait time: 478: db file parallel write ! kslwtbctx kslwtectx -- Previous wait time: 495: db file parallel write ! kslwtbctx semtimedop - 458755 semid, timeout: $24 = {tv_sec = 2, tv_nsec = 460000000} 67 The 3 IOs again, each sleeps in pwrite() for 100ms (0.1s) Yet the ‘db file parallel write’ wait shows a waiting time of 478; which is 0.478ms: the timing is wrong.
  • 68. dbwr,  synchronous  IO • So,  my  conclusion  on  the  wait  events  for  the   dbwr  with  synchronous  IO:   – The  events  are  not  properly  Bmed   – It  seems  like  the  wait  for  DFPW  is  issued  twice.   – Further  invesBgaBon  shows  that  it  does.   • My  guess  this  is  a  bug  in  the  sync.  IO  implementaBon. 68
  • 69. dbwr:  exadata • How  does  this  look  like  on  Exadata? 69
  • 70. dbwr:  exadata kslwtbctx semtimedop - 3309577 semid, timeout: $389 = {tv_sec = 3, tv_nsec = 0} kslwtectx -- Previous wait time: 1266041: rdbms ipc message $390 = "oss_write" $391 = "oss_write" $392 = "oss_write" $393 = "oss_write" $394 = "oss_write" $395 = "oss_write" ! kslwtbctx kslwtectx -- Previous wait time: 684: db file async I/O submit ! kslwtbctx $396 = "oss_wait" $397 = "oss_wait" kslwtectx -- Previous wait time: 2001: db file parallel write $398 = "oss_wait" $399 = "oss_wait" $400 = "oss_wait" $401 = "oss_wait" ! semctl - 3309577,23,16 - semid, semnum, cmd kslwtbctx semtimedop - 3309577 semid, timeout: $402 = {tv_sec = 1, tv_nsec = 630000000} kslwtectx -- Previous wait time: 1634299: rdbms ipc message 70 The dbwr semaphore sleep. The writes are issued here. This is not timed. There is the db file async I/O submit. Again, it doesn’t seem to time any of the typical IO calls! And there we got the db file parallel write. It does seem to always time two oss_wait() calls… But it looks for more IOs to finish, alike the trailing io_getevents() calls. I am quite sure oss_wait() is a blocking call…
  • 71. Conclusion • Logwriter:   – When  idle,  is  sleeping  on  a  semaphore/rdbms  ipc  message   – Gets  posted  with  semctl()  to  do  work.   – Only  writes  when  it  needs  to  do  so.   – Version  11.2.0.3:  two  methods  for  posBng  FGs:   – Polling  and  post/wait.   – Post/wait  is  default,  might  switch  to  polling.   – NoBficaBon  of  switch  is  in  log  writer  trace  file.   – Polling/nanosleep()  Bme  is  variable. 71
  • 72. Conclusion • Logwriter:   – Log  file  parallel  write     – AIO:  two  io_getevents()  calls.   – AIO:  Bme  waiBng  for  all  lgwr  submiaed  IOs  to  finish.   – Not  IO  latency  -me!   – SIO:  does  not  do  parallel  writes,  but  serial.   – SIO:  does  not  Bme  IO. 72
  • 73. Conclusion • Logwriter:   – Wait  event  IO  Bming:   – All  the  ‘*  parallel  read’  and  ‘*  parallel  write’    events  do   not  seem  to  Bme  IO  correctly  with  synchronous  IO.   – All  the  events  which  cover  single  block  IOs  do  use   synchronous  IO  calls,  even  with  asynchronous  IO  set.   – Logwriter  writes  a  warning  when  IO  Bme  exceeds  500ms  in   the  log  writer  trace  file.   – _disable_logging  *only*  disables  write  to  logs.   ! 73
  • 74. Conclusion • Database  writer:   – When  idle,  is  sleeping  on  a  semaphore/rdbms  ipc  message   – Gets  posted  with  semctl()  to  do  work.   – Only  writes  when  it  needs  to  do  so.   – Since  version  11.2.0.2,  event  ‘db  file  async  I/O  submit’:   – Is  not  shown  with  synchronous  I/O.   – Shows  the  actual  amount  of  IOs  submiaed.   – Does  not  Bme  io_submit()   – Unknown  what  or  if  it  Bmes  something. 74
  • 75. Conclusion • Database  writer:   – Event  ‘db  file  parallel  write’:   – Shows  the  minimal  number  io_getevents()  waits  for.   – The  number  of  requests  it  waits  for  varies,  but  mostly   seems  to  be  ~  25-­‐33%  of  submiaed  IOs.   – Axer  the  Bmed,  blocking,  io_getevents()  call,  it  issues   two  non-­‐blocking  io_getevents()  calls  for  the  remaining   non-­‐reaped  IOs,  if  any.   – My  current  idea  is  the  blocking  io_getevents()  call  is  an   IO  throale  mechanism. 75
  • 76. Conclusion • Database  writer:   – Event  ‘db  file  parallel  write’,  with  synchronous  IO:   – pwrite64()  calls  are  issued  serially.   – These  are  not  Bmed.   – The  event  is  triggered  twice.   ! – On  exadata,  two  out  of  the  total  number  of  oss_wait()  calls   are  Bmed  with  the  event  ‘db  file  parallel  write’. 76
  • 78. Thanks  &  Links 78 • Enkitec   • Tanel  Poder,  MarBn  Bach,  Klaas-­‐Jan  Jongsma,  Jeremy   Schneider,  Karl  Arao,  Michael  Fontana.     • hap://www.pythian.com/blog/adapBve-­‐log-­‐file-­‐ sync-­‐oracle-­‐please-­‐dont-­‐do-­‐that-­‐again/   • hap://files.e2sn.com/slides/ Tanel_Poder_log_file_sync.pdf