NexentaStor Performance Tuning - OpenStorage Summit 2011

7,385 views

Published on

Published in: Technology, Business
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,385
On SlideShare
0
From Embeds
0
Number of Embeds
32
Actions
Shares
0
Downloads
201
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • The true performance of NexentaStor systems can be difficult to predict. The key variables for getting the best performance from a NexentaStor system are the choice of hardware and dataset configuration.\n\nIn general, NexentaStor systems can perform better than proprietary storage systems. The reasons are simple, by using good, scalable storage software, Nexenta customers can leverage the improvements in component technologies. For example, changing a NexentaStor server’s processor for a faster version, a new motherboard, or adding memory is as simple and straightforward as upgrading a compute server. NexentaStor licensing is based on total storage, not the number or speed of the processors nor the amount of RAM. New or faster network interfaces can be added to improve client performance without incurring additional storage licensing costs.\n\nNexentaStor systems can increase their performance over time, cost-effectively, incrementally, and efficiently over their lifetime.\n
  • NexentaStor system building blocks include persistent storage devices. Many different block devices are supported, providing an easy migration path from legacy storage. New technologies are easily added to existing NexentaStor systems, protecting your investment against technology obsolescence.\n\nObviously, not all hardware choices are fast. The cost, performance, and reliability of system components has a significant impact on the overall system performance and dependability. This flexibility offers optimization options unparalleled in the industry.\n
  • The key to the NexentaStor system’s optimization is the Hybrid Storage Pool (HSP). Main memory or DRAM is used as an Adaptive Replacement Cache (ARC) efficiently storing both frequently used and recently used data. \nThe Separate Intent Log is used to optimize latency of synchronous write workloads, such as NFS. The log satisfies synchronous write semantics while the transactional object store optimizes and allocates space on the main pool storage. The log does not need to be very large, size according to the amount of data expected to be written in 30 seconds.\n\nA level-2 ARC or cache device can be used to cost-effectively grow the size of the ARC. For large, read-intensive workloads, the cost-per-gigabyte of SSDs is lower than main memory DRAM. Excellent read system performance can be achieved using modestly priced SSDs.\n\nThe main pool performance can be less critical when the system is configured with enough RAM when the log and L2ARC devices are fast.\n
  • The Working Set Size (WSS) is used to describe the amount of space most commonly used for applications or workloads. The use of WSS to describe performance is becoming increasingly useful as the size of disks increases. In many cases, systems are configured with 10s or 100s of Terabytes of storage, while only a small fraction of the space is used at any given time. This fraction is the working set size.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • NexentaStor Performance Tuning - OpenStorage Summit 2011

    1. 1. NexentaStor Performance Tuning Richard EllingSenior Director of Solutions Engineering Nexenta
    2. 2. Agenda• Read Performance Model• Device Performance Characterization• NexentaStor Tunables
    3. 3. Read Performance Model
    4. 4. NexentaStor Performance• Performance of NexentaStor systems is difficult to predict• Generally better than proprietary RAID systems – Proprietary systems tend to use wimpy CPUs with limited amounts of memory for cache – NexentaStor systems scale with the latest processor and memory technology• Best NexentaStor performance achieved by choosing the best hardware configuration for the job, not by “tuning” NexentaStor software2
    5. 5. Good Hardware Choices• NexentaStor uses block devices – HDDs – SSDs – Anything that looks like a set of blocks • Size must be greater than 64 MB • Sorry, floppy disks are too small• NexentaStor block drivers – Initiators: ATA, IDE, SATA, SAS, Parallel SCSI, iSCSI, FC, DDRdrive, USB, SD, CF, XD, MMC – Others: files, ramdisk, BD3
    6. 6. Hybrid Storage Pool Optimize performance and cost Adaptive Replacement Cache (ARC) separate Main Main Pool Level 2 ARC intent log Pool Write optimized HDD HDD Read optimized device (SSD) HDD device (SSD) Size (GBytes) 1 - 10 GByte large big Cost write iops/$ size/$ size/$ Use sync writes persistent storage read cache secondary Performance low-latency writes low-latency reads optimization Need more stripe more, faster stripe speed? devices4
    7. 7. Working Set Size• Average Working Set Size (WSS) is the amount of space needed to satisfy the immediate storage needs of applications or the frequently used space• Reduce WSS by – Snapshots & clones (most effective) – Compression – Deduplication5
    8. 8. Performance Envelope 10,000,000 4KB Random Read IOPS 1,000,000 100,000 10,000 1,000 0 250 500 750 1000 Working Set Size (GB) 4KB random read IOPS Expected Max Performance8
    9. 9. Performance Envelope 10,000,000 4KB Random Read IOPS 1,000,000 100,000 A R C L2ARC 10,000 Pool Disk 1,000 0 250 500 750 1000 Working Set Size (GB) 4KB random read IOPS Expected Max Performance9
    10. 10. Performance Envelope 10,000,000 ARC Hit 4KB Random Read IOPS Performance 1,000,000 L2ARC Hit Performance 100,000 A R C L2ARC Pool 10,000 Performance Pool Disk 1,000 0 250 500 750 1000 ARC ARC + L2ARC Working Set Size (GB) Size Size 4KB random read IOPS Expected Max Performance10
    11. 11. 1,500,0004KB Random Read IOPS 1,250,000 Small Config Expected Performance Medium Config Expected Performance Large Config Expected Performance 1,000,000 10 GbE wire speed 750,000 500,000 250,000 0 0 250 500 750 1000 Working Set Size (GB) Configuration Small Medium Large RAM size (GB) 24 96 192 100% ARC hit rate performance 600,000 900,000 1,300,000 L2ARC size (GB) 0 250 480 L2ARC device small random read IOPS 0 30,000 60,000 Pool small random read IOPS 1,400 3,600 8,000 11
    12. 12. Real World Example Read WriteServer Time NFSOPS BW Latency BW Latency OPS OPS (KB/sec) (usec) (KB/sec) (usec) 1 5:31:03 AM 9,699 6,780 125,163 271 2,865 29,432 242 1 5:31:04 AM 9,263 6,464 111,200 297 2,682 142,730 496 1 5:31:05 AM 11,703 7,969 131,949 258 3,535 206,254 551 1 5:31:06 AM 14,751 11,030 184,239 179 3,581 219,542 705 1 5:31:07 AM 14,318 10,916 183,431 158 3,246 88,383 353 1 5:31:08 AM 11,396 7,334 114,184 318 3,973 39,423 351 1 5:31:09 AM 10,766 7,152 123,791 274 3,518 34,355 235 2 5:21:24 AM 4,138 2,352 45,295 2,525 1,598 16,193 2,122 2 5:21:25 AM 6,050 2,366 55,238 1,211 3,209 175,509 1,193 2 5:21:26 AM 8,902 2,958 85,980 1,907 5,735 281,881 996 2 5:21:27 AM 3,456 1,669 34,443 2,212 1,526 46,251 2,291 2 5:21:28 AM 3,463 1,790 35,542 5,307 1,571 17,157 4,052 2 5:21:29 AM 3,306 1,711 29,829 3,641 1,462 40,895 2,532 2 5:21:30 AM 3,697 2,111 41,909 1,921 1,478 31,911 87712
    13. 13. Device Performance Characterization13
    14. 14. Characterizing Device Performance• Modern storage devices vary widely in performance• “Datasheets don’t lie” ... but the information is vague and unhelpful• Need a comprehensive device characterization suite14
    15. 15. SNIA SSS-PTS• SNIA recognizes difficulty in comparing devices• Proposes Solid State Storage Performance Test Specification (SSS-PTS)• Nexenta’s implementation in NexentaStor.org repository – Using open source vdbench – Results cannot be used for SSS-PTS publication, but are very useful for systems architects• Works great for HDDs, too15
    16. 16. SSS-PTS IOPS Measurement• Preconditioning and iterate until results are consistent – Helps to eliminate out-of-the-box optimizations• Read/write ratio – 100:1, 95:5, 65:35, 50:50, 35:65, 5:95, 0:100• Block I/O sizes (KB) – 0.5, 4, 8, 16, 32, 64, 128, 1024• Execute random I/O• Measure IOPS16
    17. 17. SSS-PTS Concurrent I/O Operations• Number of concurrent I/Os (or threads) can be very important – NexentaStor architects need to choose best concurrency value for the entire platform – Nexenta tests add thread counts: • 1, 2, 4, 8, 16, 32• For SSS-PTS results publication – vendors can choose which to report17
    18. 18. Sample NexentaStor ZVol Test Read:Write Ratio Block Size 100:0 95:5 65:35 50:50 35:65 5:95 0:100 (KiB) 0.5 620,880 117,710 30,625 22,197 19,010 17,512 37,404 4 603,126 96,684 19,284 15,584 13,869 9,957 19,247 8 647,250 126,288 20,177 13,769 12,348 7,405 8,741 16 338,106 48,965 9,598 7,413 5,313 4,437 4,423 32 164,678 28,759 4,574 3,483 2,428 1,983 2,264 64 84,688 11,496 2,166 1,503 1,172 829 1,076 128 46,126 5,705 965 770 611 502 571 1024 4,978 715 107 84 78 64 75 Local test, closed course, professional driver Test clearly shows effects of caching and single HDD pool18
    19. 19. ZVol Performance
    20. 20. Another SSS-PTS Result20
    21. 21. Comparing Devices21
    22. 22. All IOPS are not Created Equal Avg resp time (ms) vs. IOPS by Threads & 2 more Read % Threads 0 100 1 80 2 60 512 40 4 20 6 0 80 8 60 10 4096 40 20 0 80 60 8192 Avg resp time (ms) 40 IO size (bytes) 20 0 80 60 32768 40 20 0 80 60 65536 40 20 0 80 131072 60 40 20 0 60 70 80 90 100 110 120 130 140 150 160 170 180 190 60 70 80 90 100 110 120 130 140 150 160 170 180 190 IOPS22
    23. 23. NexentaStor Tunables23
    24. 24. Choose Appropriate Components• The biggest tuning knob• Have the right components for the job• Choose reliable components• Leverage hybrid storage pool concepts• In general, go wide then deep24
    25. 25. Recordsize and Block Size• 2nd biggest tuning knob – Recordsize is “max block size” for file systems – Block size is “only block size” for block devices• For fixed-record-length workloads – Match recordsize/block size to avoid I/O amplification for bandwidth-constrained systems – Multiples can be ok • Experiment and observe trade-offs – Smaller block sizes require more metadata per unit of available storage• For variable workloads (eg files), large recordsize is ok25
    26. 26. I/O Concurrency• For current ZFS implementations, zfs_vdev_max_pending is global, per- device setting – Older releases, default is 35 – Current releases, default 10 – Consider changing to match devices to workload – Setting can have availability implications• Room for improvement, stay tuned...26
    27. 27. Prefetching• By default, intelligent prefetching is enabled – Adaptive algorithm – If prefetching seemed to work, prefetch more – Generally works well• For high-concurrency environments, consider disabling prefetching – Not tunable from NexentaStor 3.x UI• Room for improvement, stay tuned...27
    28. 28. Compression• Compression turns big I/O into small I/O, when possible – Algorithms do not suffer from “compression growth” – Various algorithms available – Enabled by default in NexentaStor 3.x – Amaze your friends: zeros compress to nothing• For high performance environments, consider disabling compression – When bandwidth is over-provisioned – When space is inexpensive $/GB – When low variance of latency is desired28
    29. 29. Deduplication• Deduplication turns large I/O into small I/O – Does not eliminate I/O! – Avoid use for big, slow HDDs (IOPS is constrained)• In general, deduplication and high performance are not the best of friends29
    30. 30. Measure and Manage• Performance management is always a work in progress• Generalizations are becoming more difficult as workloads become more diverse• Experiment and measure prior to production• Measure and manage in production• Performance management has room for improvement, stay tuned...30
    31. 31. Questions? Richard.Elling@Nexenta.com31

    ×