Your SlideShare is downloading. ×
[INSIGHT OUT 2011] A23 database io performance measuring planning(alex)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

[INSIGHT OUT 2011] A23 database io performance measuring planning(alex)

1,389
views

Published on

Published in: Technology

2 Comments
10 Likes
Statistics
Notes
No Downloads
Views
Total Views
1,389
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
2
Likes
10
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Database I/O Performance:Measuring and PlanningAlex GorbachevInsight-Out Database SymposiumTokyo, 2011
  • 2. Alex Gorbachev • CTO, The Pythian Group • Blogger • OakTable Network member • Oracle ACE Director • BattleAgainstAnyGuess.com • President, Oracle RAC SIG2 © 2009/2010 Pythian
  • 3. Why Companies Trust Pythian • Recognized Leader: • Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and SQL Server • Work with over 150 multinational companies such as Western Union, Fox Interactive Media, and MDS Inc. to help manage their complex IT deployments • Expertise: • One of the world’s largest concentrations of dedicated, full-time DBA expertise. • Global Reach & Scalability: • 24/7/365 global remote support for DBA and consulting, systems administration, special projects or emergency response38 © 2011 Pythian
  • 4. Why Measure I/O Performance? Diagnostics & troubleshooting Proof of impact Capacity planning and monitoring Platform validation / acceptance testing4 © 2009/2010 Pythian
  • 5. Instrumentation: Storage Stack vs Oracle Database ➡ Oracle DB call ➡ Storage I/O call 1.read block • UNKNOWN 2.read block 3.latch free 4.read block 5.enqueue 6.send result Can profile a DB call Cannot profile I/O call5 © 2009/2010 Pythian
  • 6. Is Profiling an I/O Call Feasible?6 © 2009/2010 Pythian
  • 7. Direct Attached Storage Stack Illustration from Guttina Srinivass Blog - http://guttinasrinivas.wordpress.com/7 © 2009/2010 Pythian
  • 8. Simplified Enterprise Storage Stack Sample IBM Storage Stack - http://www.ibm.com/developerworks/tivoli/library/t-snaptsm1/index.html8 © 2009/2010 Pythian
  • 9. 9 © 2009/2010 Pythian
  • 10. complex Storage stack is too and heterogeneous to build end-to-end IO profile10 © 2009/2010 Pythian
  • 11. Sources of I/O Performance Measurements Database as an application consuming I/O services MUST HAVE Drill down into the rest of the I/O stack ASM Operating System Storage arrays Complimentary ...11 © 2009/2010 Pythian
  • 12. How is I/O Measured in the Database? • I/O code paths (syscalls) are instrumented - I/O Waits • timed_statistics=true • Additional statistics are collected • IO size, amount, time spent • Granularity on different levels • Global, session, datafile, service, module/action • Stored in SGA as cumulative counters - X$ tables • Externalized via V$ views • Snapshots taken by various tools like Statspack, AWR, Snapper, etc.12 © 2009/2010 Pythian
  • 13. WHAT Do We Measure? Response Time Throughput / Bandwidth Skew & Patterns I/O measurements are almost always aggregate!13 © 2009/2010 Pythian
  • 14. Reproducible issue? 10046 trace response time skew & patterns14 © 2009/2010 Pythian
  • 15. Mr Tools - The Time-Saver15 © 2009/2010 Pythian
  • 16. Example Profile: 4+ hours batch jobWait Event / Syscall DURATION CALLS MEAN MIN MAX----------------------------- ------------------------ ---------- ----------- ----------- -----------db file sequential read 11861.295517 81.4% 201940 0.058737 0.000000 5.473023log file switch (checkpoint.. 1941.262523 13.3% 49 39.617603 0.001443 211.405054PL/SQL lock timer 764.452061 5.2% 765 0.999284 0.000008 1.003142log buffer space 0.149762 0.0% 8 0.018720 0.006973 0.030125undo segment extension 0.126689 0.0% 19 0.006668 0.001265 0.0336826 others 0.201454 0.0% 14 0.014390 0.000004 0.059468----------------------------- ------------------------ ---------- ----------- ----------- -----------TOTAL (11) 14567.488006 100.0% 202795 0.071834 0.000000 211.40505416 © 2009/2010 Pythian
  • 17. I/O Response Time Histogram Matched event names: db file sequential read Options: group = name = db file sequential read where = 1 RANGE {min <= e < max} DURATION CALLS MEAN ----------------------- ------------------------ ---------- ----------- 0.000000 0.000001 0.000000 0.0% 14 0.000000 0.000001 0.000010 0.000021 0.0% 8 0.000003 0.000010 0.000100 0.008654 0.0% 180 0.000048 0.000100 0.001000 41.040579 0.3% 86617 0.000474 0.001000 0.010000 201.892556 1.7% 36305 0.005561 0.010000 0.100000 1435.417470 12.1% 66754 0.021503 0.100000 1.000000 3730.265905 31.4% 9059 0.411775 1.000000 10.000000 6452.670332 54.4% 3003 2.148741 10.000000 100.000000 0.000000 0.0% 0 100.000000 1000.000000 0.000000 0.0% 0 1000.000000 Infinity 0.000000 0.0% 0 ----------------------- ------------------------ ---------- ----------- TOTAL (8) 11861.295517 100.0% 201940 0.05873717 © 2009/2010 Pythian
  • 18. Datafile Skew? Matched event names: db file sequential read Options: group = $p1 name = db file sequential read where = 1 File ID DURATION CALLS MEAN MIN MAX 6 2383.052786 20.1% 40086 0.059449 0.000000 4.825304 10 2131.333101 18.0% 21568 0.098819 0.000029 5.366355 12 2065.204816 17.4% 35353 0.058417 0.000000 5.104831 7 1870.332973 15.8% 32955 0.056754 0.000000 4.954959 11 1711.504204 14.4% 39065 0.043812 0.000000 4.819981 9 1659.888036 14.0% 23735 0.069934 0.000000 5.473023 14 36.206148 0.3% 3141 0.011527 0.000063 4.442775 8 3.532841 0.0% 5877 0.000601 0.000073 0.061977 13 0.193044 0.0% 126 0.001532 0.000343 0.104574 1 0.046855 0.0% 32 0.001464 0.000000 0.022407 3 0.000713 0.0% 2 0.000357 0.000311 0.000402 TOTAL (11) 11861.295517 100.0% 201940 0.058737 0.000000 5.47302318 © 2009/2010 Pythian
  • 19. Analyzing Datafile ChunksMatched event names: db file sequential readOptions: group = $p1*1000000000+int($p2*8192/1024/1024) name = db file sequential read where = $ela>0.1 File Chunk DURATION CALLS MEAN MIN MAX------------ ------------------------ ---------- ----------- ----------- ----------- 10000008570 175.587622 1.7% 120 1.463230 0.134717 4.373926 6000000381 173.669439 1.7% 119 1.459407 0.107691 3.713161 10000008566 157.199899 1.5% 102 1.541175 0.167078 4.366412 10000008565 147.466754 1.4% 98 1.504763 0.128982 4.538604 6000008641 139.614461 1.4% 90 1.551272 0.127778 4.799470 10000008567 120.733972 1.2% 89 1.356561 0.100613 4.564558 9000008223 107.619815 1.1% 73 1.474244 0.118106 5.473023 10000008563 95.949235 0.9% 72 1.332628 0.115185 3.580435 9000008224 90.483791 0.9% 79 1.145364 0.129597 5.468010 6000006191 86.307121 0.8% 78 1.106502 0.102094 3.876378 4329 others 8888.304128 87.3% 11142 0.797730 0.100035 5.366355------------ ------------------------ ---------- ----------- ----------- -----------TOTAL (4339) 10182.936237 100.0% 12062 0.844216 0.100035 5.47302319 © 2009/2010 Pythian
  • 20. Playing with Chunks SizeMatched event names: db file sequential readOptions: group = $p1*1000000000+int($p2*8192/1024/1024/16) name = db file sequential read where = $ela>0.1 File Chunk DURATION CALLS MEAN MIN MAX----------- ------------------------ ---------- ----------- ----------- -----------10000000535 846.934923 8.3% 633 1.337970 0.100168 4.564558 7000000029 315.398085 3.1% 353 0.893479 0.103097 3.670991 6000000023 280.162428 2.8% 330 0.848977 0.100183 3.71316112000000171 261.555298 2.6% 268 0.975953 0.103535 4.01404312000000170 193.130501 1.9% 166 1.163437 0.102184 3.937978 9000000513 175.100649 1.7% 124 1.412102 0.118106 5.473023 7000000157 173.111037 1.7% 160 1.081944 0.102949 4.237775 6000000540 140.663440 1.4% 91 1.545752 0.127778 4.799470 6000000386 130.590608 1.3% 172 0.759248 0.100873 3.87637811000000156 122.062914 1.2% 135 0.904170 0.100622 3.748086 447 others 7544.226354 74.1% 9630 0.783409 0.100035 5.468010----------- ------------------------ ---------- ----------- ----------- -----------TOTAL (457) 10182.936237 100.0% 12062 0.844216 0.100035 5.47302320 © 2009/2010 Pythian
  • 21. Time Periods Analysis One minute average IO response time, seconds 2.0 1.5 1.0 0.5 0 1 7 13 19 25 31 38 45 52 58 64 70 77 83 92 98 104 110 116 122 128 134 140 146 152 159 165 171 177 186 196 202 208 214 220 226 232 238 24421 © 2009/2010 Pythian
  • 22. 10046 Trace Is Expensive... NOT! • 10046 tracing overhead is insignificant • This sample 4+ hours batch - trace <30MB with 300K+ lines • 10x compressed - 3 MB • 30 batches per night - <1GB of traces • 10x compressed - 100 MB per night One month of complete 10046 trace batch history is only 3GB compressed22 © 2009/2010 Pythian
  • 23. Storing 3GB of data on Amazon S3 costs less than $1 per month23 © 2009/2010 Pythian
  • 24. What Does 10046 Not Buy You? • Throughput • Doable but needs quite a bit of traces to enable and process • No accounting for non-database workload • No visibility on how each IO call translates into “real” IOs • Real IOs - requests done by DB server OS? • Real IOs - requests done by a SAN controller? • Real IOS - requests served by disk controller? • Caching impact24 © 2009/2010 Pythian
  • 25. Measuring Throughput Database Host • AWR & Statspack • OS tools Storage Array • Like sar, iostat, DTrace • Storage vendor tools • Like EMC Symmetrix Performance Analyzer (SPA)25 © 2009/2010 Pythian
  • 26. Average values make sense only if events are perfectly randomly distributed as well as response times26 © 2009/2010 Pythian
  • 27. Don’t Be Trapped by Averages! • Averaging response times • Loosing skew info • Loosing IO calls attributes • Sizes, offsets, data blocks • Loosingscope - what transaction is this IO request for? • Reduced time granularity • Traditional Statspack & AWR snaps are hourly • sar data is captured every 5 (or 10?) minutes be default • SAN stats usually aggregated as high as 1 hours (SPA - 5 minutes?)27 © 2009/2010 Pythian
  • 28. Choosing the Aggregation Interval • 24 hours running window • 95% of transaction should complete within 1 seconds • 99% of transactions should complete within 10 seconds • 10 seconds is timeout so 1% of transactions can fail and it’s OK • 24 hours is 86,400 seconds => 1% is 864 seconds (14.4 min) •1 hour intervals => few minutes hiccups won’t be noticeable • 5 minutes intervals => significant spikes of IO response time will likely be noticeable • But really want to go to intervals within the typical transaction response times28 © 2009/2010 Pythian
  • 29. Random Arrivals concept applies 100% to IO calls Detecting Random Arrivals rule violation requires averaging interval close to response time29 © 2009/2010 Pythian
  • 30. Monitoring I/O Performance and SLAs • How your transactions SLAs transform to IO SLAs? • Percentile requirements • Commit to response time according to percentile requirements at the pre-defined throughput and concurrency levels • *average* 2000 IOPS per second with up to 40 concurrent IOs • 99% of IOs - <10 ms, 99.9% IOs - <100ms • 1 minute sliding window • Monitoring such SLAs - must average 1 minute and collect response times histogram30 © 2009/2010 Pythian
  • 31. Importance of Response Time Histograms • Includinghistograms in the snapshots adds more color to the averaged measures • Histogram is an indicator of skew • They help selecting the right measurements interval • Histograms can be build on any value - not just response times • Histogram of IO throughput per 5 minutes intervals to analyze whether we have bursts of IO activity. • Histogram in Statspack reports appeared in 10g • Histogram in AWR reports appeared in 11g31 © 2009/2010 Pythian
  • 32. A Tool to Collect Short Interval Averages • Requirements: • 1 minute or less intervals • Collect system level IO waits and stats • Collect session level IO waits and stats • Collect IO response time histograms (system and session) • Nice to have - per service/module/action granularity • Production collection example (6 years old) • Oracle 9i RAC, HP-UX 64 cores • thousands DB calls per second, thousands IO calls per second • *All* stats and waits with 1-5 minute snaps and at logoff • Tanel Poder’s Snapper and Sesspack32 © 2009/2010 Pythian
  • 33. ASH Data for I/O Measurements? V$ACTIVE_SESSION_HISTORY & DBA_HIST_ACTIVE_SESS_HISTORY • TIME_WAITED => 11.2 documentation is misleading • DELTA_TIME • DELTA_READ_IO_REQUESTS/BYTES • DELTA_WRITE_IO_REQUESTS/BYTES33 © 2009/2010 Pythian
  • 34. ASH itself is misleading for I/O performance measurements Sampling tends to hide short waits invalidating it for any response time analysis34 © 2009/2010 Pythian
  • 35. AWR Sources • DBA_HIST_EVENT_HISTOGRAM • DBA_HIST_FILEMETRIC_HISTORY * • DBA_HIST_FILESTATXS • DBA_HIST_IOSTAT_DETAIL/FILETYPE/FUNCTION • DBA_HIST_SERVICE_STAT • DBA_HIST_SESSMETRIC_HISTORY * • DBA_HIST_SQLSTAT • DBA_HIST_SYSTEM_EVENT • DBA_HIST_SYSSTAT • DBA_HIST_SYSMETRIC_HISTORY * * These views have granularity of 1 minute35 © 2009/2010 Pythian
  • 36. AWR Example - DBA_HIST_SYSMETRIC_HISTORY -- Physical Reads Per Sec -- Physical Writes Per Sec -- I/O Requests per Second -- I/O Megabytes per Second -- Redo Generated Per Sec -- Average Synchronous Single-Block Read Latency SELECT begin_time, ROUND(value,1) v FROM dba_hist_sysmetric_history WHERE metric_name= Average Synchronous Single-Block Read Latency ORDER BY 1;36 © 2009/2010 Pythian
  • 37. V$SESSION_WAIT_HISTORY? • The last 10 wait events for each active session. • Column WAIT_TIME_MICRO • Amount of time waited (in microseconds)37 © 2009/2010 Pythian
  • 38. Measuring at the OS Layer • OS is not really transparent for IO requests • Has IO requests queues • Utilizes various I/O schedulers that decide on requests priority • ASYNC I/O • Filesystems and buffered I/O • Impact of CPU scheduling • Timespent in OS layer becomes important as we move to SSD and Flash storage • Difficult to directly associate OS stats with DB stats38 © 2009/2010 Pythian
  • 39. Measuring at the SAN Layer • Normally most of IO time is spent on physical disk but... • Read cache impact • Write cache impact • Cache saturation situations • Abnormal situations like controller/switch failure • Quality of Service (QoS) • Flash based storage shifts the balance of time again • Non-disk component of IO response time becomes more prominent • Difficultto associate SAN stats with OS & DB stats • Virtualization kicks in39 © 2009/2010 Pythian
  • 40. Exadata Storage Cell Measurement • Replacement of SAN layer • More than jut stats per disk / controller and etc • Storage Cell now performs more than just I/O functions • Muchbetter accountability and association with database • Database segment visibility in flash cache • IORM metrics - category, database, consumer groups • Flash Cache metrics • Cumulative and 1 minute aggregates • Some stats are passed back to the database • V$SYSSTAT, V$SQL, waits, XML cell stats in V$CELL_STATE40 © 2009/2010 Pythian
  • 41. Increased Importance of Low Latency Network • With traditional HDD random access times of 5-10ms ➡ Communication overhead is minimal - less than 10% • FC storage latencies are in few hundreds of microseconds • NFS mounted storage adds less than 1ms latency • IP stack is heavier on CPU => impact of OS CPU scheduler • Flash read latency is order of magnitude shorter ➡ Suddenly InfiniBand SAN becomes necessity! • microseconds latency41 © 2009/2010 Pythian
  • 42. Exadata: Flash + InfiniBand = Very Low Latency? • Let’s check some Exadata 10046 traces... Matched event names: cell single block physical read Options: group = name = cell single block physical read where = 1 RANGE {min <= e < max} DURATION CALLS MEAN 0.000000 0.000001 0.000000 0.0% 0 0.000001 0.000010 0.000000 0.0% 0 0.000010 0.000100 0.000000 0.0% 0 0.000100 0.001000 0.191839 95.5% 310 0.000619 0.001000 0.010000 0.008983 4.5% 3 0.002994 0.010000 0.100000 0.000000 0.0% 042 © 2009/2010 Pythian
  • 43. Exadata: Flash + InfiniBand = Very Low Latency? await  svctm  %util 0.51   0.31   5.85 0.79   0.38   6.40 0.57   0.41   6.50 0.62   0.40   7.00 0.41   0.30   4.95 0.43   0.32   5.60 Device:         rrqm/s   wrqm/s    r/s   w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util sdn               0.50     0.00 188.50  0.00  1512.00     0.00    16.04     0.10    0.51   0.31   5.85 sdo               1.50     0.00 170.50  0.00  1376.00     0.00    16.14     0.14    0.79   0.38   6.40 sdp               2.50     0.00 157.00  0.00  1276.00     0.00    16.25     0.09    0.57   0.41   6.50 sdq               0.50     0.00 173.50  0.00  1392.00     0.00    16.05     0.11    0.62   0.40   7.00 sdr               0.50     0.00 166.50  0.00  1336.00     0.00    16.05     0.07    0.41   0.30   4.95 sds               1.00     0.00 175.50  0.00  1412.00     0.00    16.09     0.08    0.43   0.32   5.6043 © 2009/2010 Pythian
  • 44. Measuring for Planning: Aggregate Interval 1. Choose a large-ish interval 2. Analyze histograms - skewed inside the interval? 3. If Yes, reduce the interval 4. Repeat steps 1-3 until ... a) you either see no skew or ... b) business stops carrying about skew inside that interval44 © 2009/2010 Pythian
  • 45. AWR Example - Reads & Writes (IOPS)45 © 2009/2010 Pythian
  • 46. AWR Example - Throughput (MBPS)46 © 2009/2010 Pythian
  • 47. AWR Example - Redo Generation (MBPS)47 © 2009/2010 Pythian
  • 48. Measuring for Planning: Distinguish Different Kinds of I/O • Random vs sequential I/O • If underlying disks are spinning media • Small vs Large IOs • Throughput is then measured either in IOPS or MBPS • Reads vs Writes • Sometimes can be generalized as what % are the writes48 © 2009/2010 Pythian
  • 49. Measuring for Planning: Business Function Granularity • Measure I/O at the right granularity • Ideally per business transaction / function • Practical - service, session, module/action, SQL • “System” I/O - LGWR, ARCH, DBWR, etc. • Indirect association to business transactions • Helps building more realistic capacity planning models49 © 2009/2010 Pythian
  • 50. planning measurements Capacity from database view alone are enough50 © 2009/2010 Pythian
  • 51. Oracle Database CALIBRATE_IO DBMS_RESOURCE_MANAGER.CALIBRATE_IO (<DISKS>, <MAX_LATENCY>, iops, mbps, lat); • iops - max read per second (random single block) • lat - actual average single block latency at iops rate • mbps - max MB/s throughput (large reads) simplistic read-only needs a database outputs max only requires ASYNC I/O51 © 2009/2010 Pythian
  • 52. ORION - ORacle I/O Numbers • Free tool from Oracle simulating database-like IOs • No database required • Same I/O libs / code-path • Still requires ASYNC I/O • Very flexible • Large vs Small IOs; flexible sizes; mixed • Random vs Sequential I/O patterns; mixed • Configurable write I/O % • Can simulate ASM striping layout52 © 2009/2010 Pythian
  • 53. ORION Example 1: Scalability Anomaly HP blades HP Virtual Connect Flex10 Big NetApp box 100 disks53 © 2009/2010 Pythian
  • 54. ORION Example 1: Impact of Large IOs HP blades HP Virtual Connect Flex10 Big NetApp box 100 disks54 © 2009/2010 Pythian
  • 55. ORION Example 1: Write IO Impact HP blades HP Virtual Connect Flex10 Big NetApp box 100 disks55 © 2009/2010 Pythian
  • 56. ORION Example 2: Initial Run - Failed Expectations NetApp NAS, 1 Gbit Ethernet, 42 disks 5000 30.0 4000 Read only 22.5 Latency, ms 3000 IOPS 15.0 2000 7.5 1000 0 0 1 2 3 4 5 10 20 30 40 50 60 70 80 90 100 IOPS Latency 5000 50 4000 40 Read write Latency, ms 3000 30 IOPS 2000 20 1000 10 0 0 1 2 3 4 5 10 20 30 40 50 60 70 80 90 10056 © 2009/2010 Pythian
  • 57. ORION Example 2: Tune-Up Results Switched from Intel to Broadcom NICs IOPS Latency 10000 12 10 8000 8 Latency, ms 6000 IOPS 6 4000 4 2000 2 0 0 1 2 3 4 5 10 20 30 40 50 60 70 80 90 100 15000 8 12500 6 10000 Latency, ms IOPS 7500 4 5000 2 2500 0 0 1 2 3 4 5 10 20 30 40 50 60 70 80 90 10057 © 2009/2010 Pythian
  • 58. ORION Example 3: RAID558 © 2009/2010 Pythian
  • 59. ORION Example 3: RAID1059 © 2009/2010 Pythian
  • 60. Presenting measurements: Visualization is the Key60 © 2009/2010 Pythian
  • 61. Q&A Email me - gorbachev@pythian.com Read my blog - http://www.pythian.com Follow me on Twitter - @AlexGorbachev Join Pythian fan club on Facebook & LinkedIn61 © 2009/2010 Pythian