SQL Server Wait Events

   Mario Broodbakker
  mario@insight-tec.co.jp
経歴
• 1987 年よりDBAとして最初はメインフレームを経験しその後、Oracle AIX
  版とWindows版に携わる
• 1997年よりBaanでPerformance & Benchmark スペシャリストとして働きそ
  の後、Compaq: Windows Oracle, SQL Server en Informix benchmarksを経
  験
• Compaq と HP: Unix、Windows Oracle パフォーマンスコンサルタント
• 2006-2009 Windows Integrity Engineering: Windows Itanium (eh, Oracle)
  benchmarks in Redmond, WA USA.
• 2002年からSQL Server 2000の解析調査を始め、ユーザセッションごとの
  wait event情報の収集やwait event tracingに携わる。後に、SQL Server
  2005についても調査 see: www.sqlinternals.com
• Wait event presentation: DBForum Lalandia 2004 (SQL Server 2000)
• SQL Server waitstuff について3つの記事を発表
    – www.simple-talk.com/sql/performance
• PGGM (Pension fund)にてDatabaseスペシャリストとして2010年
  Netherlandsに戻る: finally SQL Server DBA!
• 現在、インサイトテクノロジー在籍
wait events とは?
• SQL Server が動いていなかったら: 待機状態
• wait event が発生したら: ‘何か’ が現状のタスク
  を待ち状態にしている
• SQL Server では、待ちの発生場所を特定できる:
 –   Data file or transaction log IO
 –   Network IO
 –   Locks & Latches
 –   CPU
 –   480 以上のwait の種類がある
用途は?
• R = S +W : Response time = Service time +
  Wait time
• Response time is key for the end user. ‘R’ はオ
  ンライン response timeとなるが、‘batch’ time
  を表すこともある
• Example: response time 4 seconds は、 0.2 s
  CPU time と 3.8 seconds の IO time から構成
  されている。 Does it make sense to optimize
  CPU time? Buy faster CPUs? Build faster code?
A little bit of history..
• Oracle wait events は1994年に文書化されていて
  (Anjo Kolk)、評価を受けている。YAPP: ‘Yet
  Another Performance Profiling’ method paper.
• DBCC SQLPerf(Waitstats) 文書化されていない:
  Gert Drapers と Tom Davidson が最初に取り掛か
  る
• SQL Server 2005 以降はBOLにて公開されている
• Microsoft papers: Troubleshooting Performance
  Problems using Queues and waits (SQL server
  2005 en 2008) : Davidson ea.
Where do wait events come from?
• SOS Scheduler
  – ‘work request’ 処理中: SQL Batch または Parallel
    Query -> task
  – 1 task runs on 1 scheduler on 1 CPU until:
     • blocking call 発生: disk IO, network IO -> wait event, start
       time and type が登録される
     • Time quantum has elapsed: 4ms (always?) (scheduler and
       CPU の独占を防ぐために): SOS_SCHEDULER_WAIT (and
       SLEEP_TASK?)
  – worker thread によりtask 処理が発生: OS thread or
    Fiber (light weight pooling)
Task Flow
(from: SS2005 Practical Troubleshooting: Ken Henderson)



                                      Worker available

                       New
                                             Pending      Runna           Runni
                       Task                                                       Done
                                                           ble             ng



                                                                  Suspe
                                                                  nded



                                                                  PreEm
                                                                   ptive
Wait time
• wait time の 2 つの要素:
  – Resource wait time
     • resource が利用可能になるまでの時間。‘suspended’ 状態
       から ‘ runable’ になるまでの時間。
  – Signal wait time
     • resource が利用可能になるまでの時間とタスクの実継続時
       間: ‘runnable’ から ‘running’ になるまでの時間。
  – DMV(動的管理ビュー)のwait timeにはsignal wait
    timeが含まれる.
• SQL Server バージョンによりタイミングは異なる。
  SS2005 SP3 からは 1ms..ほど(詳細はこちらの link )
Waitsの場所
• Sys.dm_os_wait_stats (dbcc sqlperf(waitstats) ) (screenshot)
    – Since startup, or dbcc sqlperf(sys.dm_os_wait_stats, clear)
    – Wait time, Signal time (time: runnable->running)
• Sys.dm_os_waiting_tasks (screenshot)
• Sys.dm_exec_requests, Sysprocesses (screenshot)
• Sys.dm_io_virtual_file_stats(db_id,file_id) (screenshot)
    – Io_stall_read_ms, Io_stall_write_ms and num_reads/writes.
    – ‘real’ IO latency, pay attention to num_of_bytes_read/written. In most
      cases 64K per read or more! (see screenshot: virtual filestats summary)
• Sys.dm_db_index_operational_stats(db_id,object_id,etc,..)
  (screenshot)
• Not available in Profiler!
CPU timeの場所
• @@CPU_BUSY * CAST(@@TIMETICKS AS FLOAT)/1000,
  @@io_BUSY * CAST(@@TIMETICKS AS FLOAT)/1000,
  @@idle * CAST(@@TIMETICKS AS FLOAT)/1000)
  (accumulated for this SQL Server Instance)

• ./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization
  ./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle
  from:
        select timestamp, convert(xml, record) as record
         from sys.dm_os_ring_buffers
         where ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR'
         and record like '%<SystemHealth>%‘
  (see: exact query from Performance Dashboard in Slide notes)
• Sys.dm_exec_sessions: cpu_time spend for this session
• Sys.dm_exec_requests: cpu_time spend for this request (also:
  total_elapsed_time)
select wait_type,waiting_tasks_count,wait_time_ms,signal_wait_time_ms,
            wait_time_ms/waiting_tasks_count as 'avg wait ms'
        from sys.dm_os_wait_stats where waiting_tasks_count > 0
                       order by wait_time_ms desc
select session_id,exec_context_id,wait_duration_ms
            ,wait_type,resource_description
from sys.dm_os_waiting_tasks order by session_id asc
Wait 例
• Pagiolatch_xx
    – DiskからpageのIO待ち Note: IO timeとは限らない: IO size による。 High wait
      times でもIO問題を起こすこともある。virtual file statsまたはperfmon counters
      にも注意する必要がある。
• Pagelatch_xx
    – Memory access待ち
    – _UP types, mostly for ‘household’ pages (PFS,GAM,SGAM)
• Latch_xx (see next slide)
• Writelog (and logbuffer)
    – transaction log への書き込み待ち(多くはcommit後) Logbuffer wait: logbuffer
      の空き領域待ち
• LCK_M_XX
    – Row, key and page lock waits.
• Asynch_network_io
    – Network writes to client: client処理速度に依存する(and network latency)
Wait 例 2
•   Sos_scheduler_waits
     – Schedulerの利用可能待ち
     – これについての詳細はこちら www.simple-talk.com
•   CXPACKET
     – Parallel Queryの同期待ち: 設計の一部なので問題ではない
     – Very good presentation: http://www.sqlworkshops.com webcast2
•   SLEEP_TASK en IO_COMPLETION
     – Sleep_task は‘scheduler yield’または‘normal sleep’として使用されることがある。
       IO_completionと一緒の場合: likely hash joins and sort activity (Tempdb. へ吐き出される)
     – Again really well explained: http://www.sqlworkshops.com webcast3
•   CMEMTHREAD, RESOURCE_SEMAPHORE
     – 実行計画のcaching/recompileが問題? Memoryを大量に使用するqueries: big sorts/hash
       joins
•   Background waits: Lazywriter_sleep, sqltrace_buffer_flush, logmgr_queue (watch
    out for Sleep_task! Multiple (miss?) use)
•   PreEmptive waits
     – Outside of SOS scheduler, for instance system calls or external stored procedures.
Latches
• 短期間同期objects
• Sys.dm_os_latch_stats.
  – Latch_class ‘BUFFER’ は PAGE% latchesの合計
  – BOLにていくつか文書化されている
  – ACCESS_METHODS_xx: indexesやheapsを処理する際
    に使用 (SCAN/KEY_RANGE_GENERATOR: Parallel
    Query
  – LOG_MANAGER latch: Txlogが自動拡張する際に使用
  – 例は ‘SQL Broker trouble’ スライドにて
Tools
• 現状をPerformance dashboard (screenshot), drilldown
  tool で表示
• Management DW & Performance collector
   – SQL Server 2008, expandable DW for performance info.
• SQLSTAT2005 (codeplex), takes snapshots of important
  DMVs, shows PerfDashboard-like reports. (see example
  slide)
• No SQL Trace or Profiler ! (sigh..)
• Xevents in SQL Server 2008
• 自作scripts: begin/end_waitstats (例はスライドノートを
  参照)
Performance Dashboard
Performance Dashboard 2
Begin/End waitstats output
wait_type                             waits wait_time sigwaittime      ela sec
------------------------           -------- ---------- ----------- -     ------
LCK_M_X                                   6         94           15     30
LATCH_SH                               253        688            47     30
LATCH_EX                               371        985            94     30
PAGELATCH_SH                           208          47           15     30
PAGELATCH_EX                          7194      1484          1453      30
PAGEIOLATCH_SH                        5742     63078            469     30
PAGEIOLATCH_EX                        2951     29422            266     30
IO_COMPLETION                          341        953              0    30
ASYNC_NETWORK_IO                   32203       27750          8687      30
SLEEP_BPOOL_FLUSH                     139        1109              0    30
SLEEP_TASK                           2777          891          813     30
DTC                                  1619     123266            937     30
BROKER_RECEIVE_WAITFOR                836       45938           328      30
SOS_SCHEDULER_YIELD                  3547         4125        4125      30
WRITELOG                             6679     121015          3313      30
CMEMTHREAD                              90           31           32     30
CXPACKET                             1255         2422          312      30
TRANSACTION_MUTEX                     237         1578          203      30
DTC_ABORT_REQUEST                      30       87000              0     30
BROKER_TASK_STOP                    4184      232547          2062       30

now                     cputime iotime idletime
----------------------- ------------- -------- --------
2011-02-03 20:48:00.433 39593.75 2125 75781.25
Wait stats snapshots stacked bar
But…
• session, SQL 文, Batchごとのwait event を見ることができない
• Sessionかbatchのどちらがwait eventの原因なのかは推測の域
• Unless you use ‘dangerous’ tools from sqlinternals.com (see
  example)
• Or use SQL Server 2008: Xevents! (demo), unfortunately no DMVs
  only clumsy XML
• 多くのSQL Server動作はOracleと比較すると非同期的に実行される
  (pagiolatch waits vs filestats: Oracle: db file sequential read)
• Don’t forget about CPU time: it’s still part of response time!
• Despite the fact that wait events are extremely important, there is
  more to measure. But not much more..
• ..The best optimization is elimination: Only do what you need to do:
  keep questioning code and (business) processes
実例
• insert loop, many commits
• Broker problems
• SAN移行後、Writelog がスローダウン
• ミラーディスクにTempdbを置いたとき
                  Tempdb
• Batch response time breakdown: DB timeを
  appserver timeと比較して
• Demo SQL Server 2008 XEvents
Insert loop 10k rows, commit inside or outside SQLInternals tools)
Commit Inside loop, per row. (or actually no commit, no transaction, in SSMS)

Spid   Ec       resource                       time(ms)    count sig       avg   perc
51      0       Elapsedtime                      10102         0   0         0   n/a
51      0       CPU                               1890        13   0 145,3846    19 %
51      0       SOS_SCHEDULER_YIELD                  0        12 0          0      0%
51      0       PAGEIOLATCH_SH                     406        70   0       5,8     4%
51      0       PAGEIOLATCH_EX                      15         5   0         3     0%
51      0       WRITELOG                          7531     10003 390 0,7528741    75 %
51      0       Unaccounted for                    260          0   0        0    3%


One Commit outside of the loop, with begin transaction:

Spid   Ec       resource                       time(ms) count sig      avg       perc
51      0       Elapsed time                        911     0   0        0        n/a
51      0       CPU                                 812     1   0      812         89 %
51      0       SOS_SCHEDULER_YIELD                  62   168 62 0,3690476          7%
51      0       ASYNC_NETWORK_IO                     15    11 0 1,363636            2%
51      0       WRITELOG                              0      1  0       0           0%
51      0       Unaccounted for                      22      0  0        0          2%
SQL Broker trouble (1 minutr snapshots):
wait_type                                 waits       wait_time signaltime
----------------------------------------- -------- ---------------- -------------
LATCH_SH                                       1          300000              0
PAGEIOLATCH_SH                            8665             47968             94
PAGEIOLATCH_EX                                 1                16            0
ASYNC_NETWORK_IO                            495               172           62
SLEEP_TASK                                3674                438          438
SOS_SCHEDULER_YIELD                       2561               1031         1032
WRITELOG                                      38                62           15
CMEMTHREAD                                 8552               296          219
cputime                             iotime               idletime
---------------------- ---------------------- ----------------------
          93718,75                  17500                 6093,75

(next slide: sysprocesses)
SQL Broker trouble 2 sysprocesses:
spid kpid blocked waittype waittime                  lastwaittype            waitresource                                                  cpu
------ ------ --------- ------------- ------------   ----------------------- ------------------------------------------------------------- ----------------
17 2580 0             0x0000          0                CMEMTHREAD                                                                          479216906
18 2584 0             0x0000          0                CMEMTHREAD                                                                          463373718
19 2624 18 0x0022                     61671                  LATCH_SH SERVICE_BROKER_TRANSMITTER (801C4264)                                         1546
23 2604 17 0x0022 75437                                      LATCH_SH SERVICE_BROKER_TRANSMITTER (801C40EC)                                          110
25     668 18 0x0022 155140                                  LATCH_SH SERVICE_BROKER_TRANSMITTER (801C4264)                                              3
27 3684 18 0x0022 84031                                      LATCH_SH SERVICE_BROKER_TRANSMITTER (801C4264)                                             12

(sysprocesses.command=‘BRKR TASK’)
Spid 17 en 18 in a CPU loop while holding SB transmitter latch: blocking 19,23,25 en 27
Problem: broker cannot send message due to certificate problems.




                                                                                                                                 back
Txlog writes slowdown
Time in ms




                                     Day of year
Sqlstat2005 8 hour day time




                              Back
TEMPDB write times on mirrored disk




(Compare with previous slide: non-mirrored)

                                              back
Session timing: 10 min Batch: DB uses only 1 resp. 2 minutes!
Spid        EC        ResourceDescription                    Time(ms)     Count       SignalTime(ms)         AvgTime(ms)    Perc
----------- ----------- -------------------------------- -   ----------   ----------- --------------------   ------------   ----
110         0         Elapsed time                           116974       0                      0            0             n/a
110         0         CPU                                    40737        7914                    0          5,14746        35 %
110         0         LCK_M_RS_S                             14390        122                    31           117,9508      12 %   *
110         0         LCK_M_S                                13171        108                    46          121,9537       11 %   *
110         0         PAGEIOLATCH_SH                         32390        6186                   296         5,236017       28 %
110         0         SOS_SCHEDULER_YIELD                    5171         6486                   5171         0,7972556      4%
110         0         PAGELATCH_SH                           31           255                    31          0,1215686      0%
110         0         WRITELOG                               406          293                    31          1,385666       0%
110         0         LCK_M_SCH_M                             46           4                      0          11,5            0%
110         0         DTC                                    2062         227                    31          9,0837         2%
110         0         TRANSACTION_MUTEX                      218          175                    15           1,245714      0%
110         0         ASYNC_NETWORK_IO                        4328        1327                   421         3,261492       4%
110         0         LCK_M_X                                796          16                     0           49,75          1%
110         0         PAGEIOLATCH_EX                         156          35                     0           4,457143       0%
110         0         PAGELATCH_EX                           296           550                   281         0,5381818      0%
110         0         Unaccounted for                         2776         0                     0           0              2%

72         0         Elapsed time                            62532        0                     0            0              n/a
72         0         CPU                                     5996         6168                  0            0,9721141      10 %
72         0         PAGEIOLATCH_SH                          25812        4577                  343          5,639502       41 %
72         0         SOS_SCHEDULER_YIELD                     359          434                   359          0,827189       1%
72         0         IO_COMPLETION                           78           87                    0            0,8965517      0%
72         0         ASYNC_NETWORK_IO                        4281         1158                  359          3,696891       7%
72         0         PAGELATCH_EX                            765          1336                  765          0,5726048      1%
72         0         DTC                                     1093         101                   46           10,82178       2%
72         0         TRANSACTION_MUTEX                       156          71                    0            2,197183       0%
72         0         PAGEIOLATCH_EX                          18328        3470                  140          5,281845       29 %
72         0         LCK_M_RIn_NL                            515          6                     0            85,83334       1%
72         0         CMEMTHREAD                              0            1                     0            0              0%
72         0         LCK_M_S                                 3937         1                     0            3937           6%
72         0         Unaccounted for                         1212         0                     0            0              2%


     * The locking in the first session was resolved with read committed snapshots and isolation levels
Read io time summary


count




                               Ms/read

[INSIGHT OUT 2011] A24 sql server wait events(mario broodbakker)

  • 1.
    SQL Server WaitEvents Mario Broodbakker mario@insight-tec.co.jp
  • 2.
    経歴 • 1987 年よりDBAとして最初はメインフレームを経験しその後、OracleAIX 版とWindows版に携わる • 1997年よりBaanでPerformance & Benchmark スペシャリストとして働きそ の後、Compaq: Windows Oracle, SQL Server en Informix benchmarksを経 験 • Compaq と HP: Unix、Windows Oracle パフォーマンスコンサルタント • 2006-2009 Windows Integrity Engineering: Windows Itanium (eh, Oracle) benchmarks in Redmond, WA USA. • 2002年からSQL Server 2000の解析調査を始め、ユーザセッションごとの wait event情報の収集やwait event tracingに携わる。後に、SQL Server 2005についても調査 see: www.sqlinternals.com • Wait event presentation: DBForum Lalandia 2004 (SQL Server 2000) • SQL Server waitstuff について3つの記事を発表 – www.simple-talk.com/sql/performance • PGGM (Pension fund)にてDatabaseスペシャリストとして2010年 Netherlandsに戻る: finally SQL Server DBA! • 現在、インサイトテクノロジー在籍
  • 3.
    wait events とは? •SQL Server が動いていなかったら: 待機状態 • wait event が発生したら: ‘何か’ が現状のタスク を待ち状態にしている • SQL Server では、待ちの発生場所を特定できる: – Data file or transaction log IO – Network IO – Locks & Latches – CPU – 480 以上のwait の種類がある
  • 4.
    用途は? • R =S +W : Response time = Service time + Wait time • Response time is key for the end user. ‘R’ はオ ンライン response timeとなるが、‘batch’ time を表すこともある • Example: response time 4 seconds は、 0.2 s CPU time と 3.8 seconds の IO time から構成 されている。 Does it make sense to optimize CPU time? Buy faster CPUs? Build faster code?
  • 5.
    A little bitof history.. • Oracle wait events は1994年に文書化されていて (Anjo Kolk)、評価を受けている。YAPP: ‘Yet Another Performance Profiling’ method paper. • DBCC SQLPerf(Waitstats) 文書化されていない: Gert Drapers と Tom Davidson が最初に取り掛か る • SQL Server 2005 以降はBOLにて公開されている • Microsoft papers: Troubleshooting Performance Problems using Queues and waits (SQL server 2005 en 2008) : Davidson ea.
  • 6.
    Where do waitevents come from? • SOS Scheduler – ‘work request’ 処理中: SQL Batch または Parallel Query -> task – 1 task runs on 1 scheduler on 1 CPU until: • blocking call 発生: disk IO, network IO -> wait event, start time and type が登録される • Time quantum has elapsed: 4ms (always?) (scheduler and CPU の独占を防ぐために): SOS_SCHEDULER_WAIT (and SLEEP_TASK?) – worker thread によりtask 処理が発生: OS thread or Fiber (light weight pooling)
  • 7.
    Task Flow (from: SS2005Practical Troubleshooting: Ken Henderson) Worker available New Pending Runna Runni Task Done ble ng Suspe nded PreEm ptive
  • 8.
    Wait time • waittime の 2 つの要素: – Resource wait time • resource が利用可能になるまでの時間。‘suspended’ 状態 から ‘ runable’ になるまでの時間。 – Signal wait time • resource が利用可能になるまでの時間とタスクの実継続時 間: ‘runnable’ から ‘running’ になるまでの時間。 – DMV(動的管理ビュー)のwait timeにはsignal wait timeが含まれる. • SQL Server バージョンによりタイミングは異なる。 SS2005 SP3 からは 1ms..ほど(詳細はこちらの link )
  • 9.
    Waitsの場所 • Sys.dm_os_wait_stats (dbccsqlperf(waitstats) ) (screenshot) – Since startup, or dbcc sqlperf(sys.dm_os_wait_stats, clear) – Wait time, Signal time (time: runnable->running) • Sys.dm_os_waiting_tasks (screenshot) • Sys.dm_exec_requests, Sysprocesses (screenshot) • Sys.dm_io_virtual_file_stats(db_id,file_id) (screenshot) – Io_stall_read_ms, Io_stall_write_ms and num_reads/writes. – ‘real’ IO latency, pay attention to num_of_bytes_read/written. In most cases 64K per read or more! (see screenshot: virtual filestats summary) • Sys.dm_db_index_operational_stats(db_id,object_id,etc,..) (screenshot) • Not available in Profiler!
  • 10.
    CPU timeの場所 • @@CPU_BUSY* CAST(@@TIMETICKS AS FLOAT)/1000, @@io_BUSY * CAST(@@TIMETICKS AS FLOAT)/1000, @@idle * CAST(@@TIMETICKS AS FLOAT)/1000) (accumulated for this SQL Server Instance) • ./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization ./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle from: select timestamp, convert(xml, record) as record from sys.dm_os_ring_buffers where ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR' and record like '%<SystemHealth>%‘ (see: exact query from Performance Dashboard in Slide notes) • Sys.dm_exec_sessions: cpu_time spend for this session • Sys.dm_exec_requests: cpu_time spend for this request (also: total_elapsed_time)
  • 11.
    select wait_type,waiting_tasks_count,wait_time_ms,signal_wait_time_ms, wait_time_ms/waiting_tasks_count as 'avg wait ms' from sys.dm_os_wait_stats where waiting_tasks_count > 0 order by wait_time_ms desc
  • 12.
    select session_id,exec_context_id,wait_duration_ms ,wait_type,resource_description from sys.dm_os_waiting_tasks order by session_id asc
  • 13.
    Wait 例 • Pagiolatch_xx – DiskからpageのIO待ち Note: IO timeとは限らない: IO size による。 High wait times でもIO問題を起こすこともある。virtual file statsまたはperfmon counters にも注意する必要がある。 • Pagelatch_xx – Memory access待ち – _UP types, mostly for ‘household’ pages (PFS,GAM,SGAM) • Latch_xx (see next slide) • Writelog (and logbuffer) – transaction log への書き込み待ち(多くはcommit後) Logbuffer wait: logbuffer の空き領域待ち • LCK_M_XX – Row, key and page lock waits. • Asynch_network_io – Network writes to client: client処理速度に依存する(and network latency)
  • 14.
    Wait 例 2 • Sos_scheduler_waits – Schedulerの利用可能待ち – これについての詳細はこちら www.simple-talk.com • CXPACKET – Parallel Queryの同期待ち: 設計の一部なので問題ではない – Very good presentation: http://www.sqlworkshops.com webcast2 • SLEEP_TASK en IO_COMPLETION – Sleep_task は‘scheduler yield’または‘normal sleep’として使用されることがある。 IO_completionと一緒の場合: likely hash joins and sort activity (Tempdb. へ吐き出される) – Again really well explained: http://www.sqlworkshops.com webcast3 • CMEMTHREAD, RESOURCE_SEMAPHORE – 実行計画のcaching/recompileが問題? Memoryを大量に使用するqueries: big sorts/hash joins • Background waits: Lazywriter_sleep, sqltrace_buffer_flush, logmgr_queue (watch out for Sleep_task! Multiple (miss?) use) • PreEmptive waits – Outside of SOS scheduler, for instance system calls or external stored procedures.
  • 15.
    Latches • 短期間同期objects • Sys.dm_os_latch_stats. – Latch_class ‘BUFFER’ は PAGE% latchesの合計 – BOLにていくつか文書化されている – ACCESS_METHODS_xx: indexesやheapsを処理する際 に使用 (SCAN/KEY_RANGE_GENERATOR: Parallel Query – LOG_MANAGER latch: Txlogが自動拡張する際に使用 – 例は ‘SQL Broker trouble’ スライドにて
  • 16.
    Tools • 現状をPerformance dashboard(screenshot), drilldown tool で表示 • Management DW & Performance collector – SQL Server 2008, expandable DW for performance info. • SQLSTAT2005 (codeplex), takes snapshots of important DMVs, shows PerfDashboard-like reports. (see example slide) • No SQL Trace or Profiler ! (sigh..) • Xevents in SQL Server 2008 • 自作scripts: begin/end_waitstats (例はスライドノートを 参照)
  • 17.
  • 18.
  • 19.
    Begin/End waitstats output wait_type waits wait_time sigwaittime ela sec ------------------------ -------- ---------- ----------- - ------ LCK_M_X 6 94 15 30 LATCH_SH 253 688 47 30 LATCH_EX 371 985 94 30 PAGELATCH_SH 208 47 15 30 PAGELATCH_EX 7194 1484 1453 30 PAGEIOLATCH_SH 5742 63078 469 30 PAGEIOLATCH_EX 2951 29422 266 30 IO_COMPLETION 341 953 0 30 ASYNC_NETWORK_IO 32203 27750 8687 30 SLEEP_BPOOL_FLUSH 139 1109 0 30 SLEEP_TASK 2777 891 813 30 DTC 1619 123266 937 30 BROKER_RECEIVE_WAITFOR 836 45938 328 30 SOS_SCHEDULER_YIELD 3547 4125 4125 30 WRITELOG 6679 121015 3313 30 CMEMTHREAD 90 31 32 30 CXPACKET 1255 2422 312 30 TRANSACTION_MUTEX 237 1578 203 30 DTC_ABORT_REQUEST 30 87000 0 30 BROKER_TASK_STOP 4184 232547 2062 30 now cputime iotime idletime ----------------------- ------------- -------- -------- 2011-02-03 20:48:00.433 39593.75 2125 75781.25
  • 20.
  • 21.
    But… • session, SQL文, Batchごとのwait event を見ることができない • Sessionかbatchのどちらがwait eventの原因なのかは推測の域 • Unless you use ‘dangerous’ tools from sqlinternals.com (see example) • Or use SQL Server 2008: Xevents! (demo), unfortunately no DMVs only clumsy XML • 多くのSQL Server動作はOracleと比較すると非同期的に実行される (pagiolatch waits vs filestats: Oracle: db file sequential read) • Don’t forget about CPU time: it’s still part of response time! • Despite the fact that wait events are extremely important, there is more to measure. But not much more.. • ..The best optimization is elimination: Only do what you need to do: keep questioning code and (business) processes
  • 22.
    実例 • insert loop,many commits • Broker problems • SAN移行後、Writelog がスローダウン • ミラーディスクにTempdbを置いたとき Tempdb • Batch response time breakdown: DB timeを appserver timeと比較して • Demo SQL Server 2008 XEvents
  • 23.
    Insert loop 10krows, commit inside or outside SQLInternals tools) Commit Inside loop, per row. (or actually no commit, no transaction, in SSMS) Spid Ec resource time(ms) count sig avg perc 51 0 Elapsedtime 10102 0 0 0 n/a 51 0 CPU 1890 13 0 145,3846 19 % 51 0 SOS_SCHEDULER_YIELD 0 12 0 0 0% 51 0 PAGEIOLATCH_SH 406 70 0 5,8 4% 51 0 PAGEIOLATCH_EX 15 5 0 3 0% 51 0 WRITELOG 7531 10003 390 0,7528741 75 % 51 0 Unaccounted for 260 0 0 0 3% One Commit outside of the loop, with begin transaction: Spid Ec resource time(ms) count sig avg perc 51 0 Elapsed time 911 0 0 0 n/a 51 0 CPU 812 1 0 812 89 % 51 0 SOS_SCHEDULER_YIELD 62 168 62 0,3690476 7% 51 0 ASYNC_NETWORK_IO 15 11 0 1,363636 2% 51 0 WRITELOG 0 1 0 0 0% 51 0 Unaccounted for 22 0 0 0 2%
  • 24.
    SQL Broker trouble(1 minutr snapshots): wait_type waits wait_time signaltime ----------------------------------------- -------- ---------------- ------------- LATCH_SH 1 300000 0 PAGEIOLATCH_SH 8665 47968 94 PAGEIOLATCH_EX 1 16 0 ASYNC_NETWORK_IO 495 172 62 SLEEP_TASK 3674 438 438 SOS_SCHEDULER_YIELD 2561 1031 1032 WRITELOG 38 62 15 CMEMTHREAD 8552 296 219 cputime iotime idletime ---------------------- ---------------------- ---------------------- 93718,75 17500 6093,75 (next slide: sysprocesses)
  • 25.
    SQL Broker trouble2 sysprocesses: spid kpid blocked waittype waittime lastwaittype waitresource cpu ------ ------ --------- ------------- ------------ ----------------------- ------------------------------------------------------------- ---------------- 17 2580 0 0x0000 0 CMEMTHREAD 479216906 18 2584 0 0x0000 0 CMEMTHREAD 463373718 19 2624 18 0x0022 61671 LATCH_SH SERVICE_BROKER_TRANSMITTER (801C4264) 1546 23 2604 17 0x0022 75437 LATCH_SH SERVICE_BROKER_TRANSMITTER (801C40EC) 110 25 668 18 0x0022 155140 LATCH_SH SERVICE_BROKER_TRANSMITTER (801C4264) 3 27 3684 18 0x0022 84031 LATCH_SH SERVICE_BROKER_TRANSMITTER (801C4264) 12 (sysprocesses.command=‘BRKR TASK’) Spid 17 en 18 in a CPU loop while holding SB transmitter latch: blocking 19,23,25 en 27 Problem: broker cannot send message due to certificate problems. back
  • 26.
    Txlog writes slowdown Timein ms Day of year
  • 27.
    Sqlstat2005 8 hourday time Back
  • 28.
    TEMPDB write timeson mirrored disk (Compare with previous slide: non-mirrored) back
  • 29.
    Session timing: 10min Batch: DB uses only 1 resp. 2 minutes! Spid EC ResourceDescription Time(ms) Count SignalTime(ms) AvgTime(ms) Perc ----------- ----------- -------------------------------- - ---------- ----------- -------------------- ------------ ---- 110 0 Elapsed time 116974 0 0 0 n/a 110 0 CPU 40737 7914 0 5,14746 35 % 110 0 LCK_M_RS_S 14390 122 31 117,9508 12 % * 110 0 LCK_M_S 13171 108 46 121,9537 11 % * 110 0 PAGEIOLATCH_SH 32390 6186 296 5,236017 28 % 110 0 SOS_SCHEDULER_YIELD 5171 6486 5171 0,7972556 4% 110 0 PAGELATCH_SH 31 255 31 0,1215686 0% 110 0 WRITELOG 406 293 31 1,385666 0% 110 0 LCK_M_SCH_M 46 4 0 11,5 0% 110 0 DTC 2062 227 31 9,0837 2% 110 0 TRANSACTION_MUTEX 218 175 15 1,245714 0% 110 0 ASYNC_NETWORK_IO 4328 1327 421 3,261492 4% 110 0 LCK_M_X 796 16 0 49,75 1% 110 0 PAGEIOLATCH_EX 156 35 0 4,457143 0% 110 0 PAGELATCH_EX 296 550 281 0,5381818 0% 110 0 Unaccounted for 2776 0 0 0 2% 72 0 Elapsed time 62532 0 0 0 n/a 72 0 CPU 5996 6168 0 0,9721141 10 % 72 0 PAGEIOLATCH_SH 25812 4577 343 5,639502 41 % 72 0 SOS_SCHEDULER_YIELD 359 434 359 0,827189 1% 72 0 IO_COMPLETION 78 87 0 0,8965517 0% 72 0 ASYNC_NETWORK_IO 4281 1158 359 3,696891 7% 72 0 PAGELATCH_EX 765 1336 765 0,5726048 1% 72 0 DTC 1093 101 46 10,82178 2% 72 0 TRANSACTION_MUTEX 156 71 0 2,197183 0% 72 0 PAGEIOLATCH_EX 18328 3470 140 5,281845 29 % 72 0 LCK_M_RIn_NL 515 6 0 85,83334 1% 72 0 CMEMTHREAD 0 1 0 0 0% 72 0 LCK_M_S 3937 1 0 3937 6% 72 0 Unaccounted for 1212 0 0 0 2% * The locking in the first session was resolved with read committed snapshots and isolation levels
  • 30.
    Read io timesummary count Ms/read