4. NexentaStor Performance
• Performance of NexentaStor systems is
difficult to predict
• Generally better than proprietary RAID
systems
– Proprietary systems tend to use wimpy CPUs with
limited amounts of memory for cache
– NexentaStor systems scale with the latest
processor and memory technology
• Best NexentaStor performance achieved by
choosing the best hardware configuration for
the job, not by “tuning” NexentaStor software
2
5. Good Hardware Choices
• NexentaStor uses block devices
– HDDs
– SSDs
– Anything that looks like a set of blocks
• Size must be greater than 64 MB
• Sorry, floppy disks are too small
• NexentaStor block drivers
– Initiators: ATA, IDE, SATA, SAS, Parallel SCSI,
iSCSI, FC, DDRdrive, USB, SD, CF, XD, MMC
– Others: files, ramdisk, BD
3
6. Hybrid Storage Pool
Optimize performance and cost
Adaptive Replacement Cache
(ARC)
separate Main
Main Pool Level 2 ARC
intent log Pool
Write optimized HDD
HDD Read optimized
device (SSD) HDD device (SSD)
Size (GBytes) 1 - 10 GByte large big
Cost write iops/$ size/$ size/$
Use sync writes persistent storage read cache
secondary
Performance low-latency writes low-latency reads
optimization
Need more stripe more, faster stripe
speed? devices
4
7. Working Set Size
• Average Working Set Size (WSS) is the
amount of space needed to satisfy the
immediate storage needs of
applications or the frequently used
space
• Reduce WSS by
– Snapshots & clones (most effective)
– Compression
– Deduplication
5
8. Performance Envelope
10,000,000
4KB Random Read IOPS
1,000,000
100,000
10,000
1,000
0
250
500
750
1000
Working Set Size (GB)
4KB random read IOPS
Expected Max Performance
8
9. Performance Envelope
10,000,000
4KB Random Read IOPS
1,000,000
100,000
A
R
C L2ARC
10,000
Pool Disk
1,000
0
250
500
750
1000
Working Set Size (GB)
4KB random read IOPS
Expected Max Performance
9
10. Performance Envelope
10,000,000
ARC Hit
4KB Random Read IOPS
Performance
1,000,000
L2ARC Hit
Performance
100,000
A
R
C L2ARC Pool
10,000
Performance
Pool Disk
1,000
0
250
500
750
1000
ARC ARC + L2ARC
Working Set Size (GB)
Size Size
4KB random read IOPS
Expected Max Performance
10
11. 1,500,000
4KB Random Read IOPS
1,250,000 Small Config Expected Performance
Medium Config Expected Performance
Large Config Expected Performance
1,000,000 10 GbE wire speed
750,000
500,000
250,000
0
0 250 500 750 1000
Working Set Size (GB)
Configuration Small Medium Large
RAM size (GB) 24 96 192
100% ARC hit rate performance 600,000 900,000 1,300,000
L2ARC size (GB) 0 250 480
L2ARC device small random read IOPS 0 30,000 60,000
Pool small random read IOPS 1,400 3,600 8,000
11
12. Real World Example
Read Write
Server Time NFSOPS BW Latency BW Latency
OPS OPS
(KB/sec) (usec) (KB/sec) (usec)
1 5:31:03 AM 9,699 6,780 125,163 271 2,865 29,432 242
1 5:31:04 AM 9,263 6,464 111,200 297 2,682 142,730 496
1 5:31:05 AM 11,703 7,969 131,949 258 3,535 206,254 551
1 5:31:06 AM 14,751 11,030 184,239 179 3,581 219,542 705
1 5:31:07 AM 14,318 10,916 183,431 158 3,246 88,383 353
1 5:31:08 AM 11,396 7,334 114,184 318 3,973 39,423 351
1 5:31:09 AM 10,766 7,152 123,791 274 3,518 34,355 235
2 5:21:24 AM 4,138 2,352 45,295 2,525 1,598 16,193 2,122
2 5:21:25 AM 6,050 2,366 55,238 1,211 3,209 175,509 1,193
2 5:21:26 AM 8,902 2,958 85,980 1,907 5,735 281,881 996
2 5:21:27 AM 3,456 1,669 34,443 2,212 1,526 46,251 2,291
2 5:21:28 AM 3,463 1,790 35,542 5,307 1,571 17,157 4,052
2 5:21:29 AM 3,306 1,711 29,829 3,641 1,462 40,895 2,532
2 5:21:30 AM 3,697 2,111 41,909 1,921 1,478 31,911 877
12
14. Characterizing Device
Performance
• Modern storage devices vary widely in
performance
• “Datasheets don’t lie”
... but the information is vague and
unhelpful
• Need a comprehensive device
characterization suite
14
15. SNIA SSS-PTS
• SNIA recognizes difficulty in comparing
devices
• Proposes Solid State Storage Performance
Test Specification (SSS-PTS)
• Nexenta’s implementation in
NexentaStor.org repository
– Using open source vdbench
– Results cannot be used for SSS-PTS publication,
but are very useful for systems architects
• Works great for HDDs, too
15
16. SSS-PTS IOPS Measurement
• Preconditioning and iterate until results are
consistent
– Helps to eliminate out-of-the-box
optimizations
• Read/write ratio
– 100:1, 95:5, 65:35, 50:50, 35:65, 5:95, 0:100
• Block I/O sizes (KB)
– 0.5, 4, 8, 16, 32, 64, 128, 1024
• Execute random I/O
• Measure IOPS
16
17. SSS-PTS Concurrent I/O
Operations
• Number of concurrent I/Os (or threads)
can be very important
– NexentaStor architects need to choose
best concurrency value for the entire
platform
– Nexenta tests add thread counts:
• 1, 2, 4, 8, 16, 32
• For SSS-PTS results publication
– vendors can choose which to report
17
24. Choose Appropriate
Components
• The biggest tuning knob
• Have the right components for the job
• Choose reliable components
• Leverage hybrid storage pool concepts
• In general, go wide then deep
24
25. Recordsize and Block Size
• 2nd biggest tuning knob
– Recordsize is “max block size” for file systems
– Block size is “only block size” for block devices
• For fixed-record-length workloads
– Match recordsize/block size to avoid I/O
amplification for bandwidth-constrained systems
– Multiples can be ok
• Experiment and observe trade-offs
– Smaller block sizes require more metadata per unit of
available storage
• For variable workloads (eg files), large recordsize
is ok
25
26. I/O Concurrency
• For current ZFS implementations,
zfs_vdev_max_pending is global, per-
device setting
– Older releases, default is 35
– Current releases, default 10
– Consider changing to match devices to
workload
– Setting can have availability implications
• Room for improvement, stay tuned...
26
27. Prefetching
• By default, intelligent prefetching is
enabled
– Adaptive algorithm
– If prefetching seemed to work, prefetch more
– Generally works well
• For high-concurrency environments,
consider disabling prefetching
– Not tunable from NexentaStor 3.x UI
• Room for improvement, stay tuned...
27
28. Compression
• Compression turns big I/O into small I/O, when
possible
– Algorithms do not suffer from “compression growth”
– Various algorithms available
– Enabled by default in NexentaStor 3.x
– Amaze your friends: zeros compress to nothing
• For high performance environments, consider
disabling compression
– When bandwidth is over-provisioned
– When space is inexpensive $/GB
– When low variance of latency is desired
28
29. Deduplication
• Deduplication turns large I/O into
small I/O
– Does not eliminate I/O!
– Avoid use for big, slow HDDs (IOPS is
constrained)
• In general, deduplication and high
performance are not the best of friends
29
30. Measure and Manage
• Performance management is always a work
in progress
• Generalizations are becoming more
difficult as workloads become more diverse
• Experiment and measure prior to
production
• Measure and manage in production
• Performance management has room for
improvement, stay tuned...
30
The true performance of NexentaStor systems can be difficult to predict. The key variables for getting the best performance from a NexentaStor system are the choice of hardware and dataset configuration.\n\nIn general, NexentaStor systems can perform better than proprietary storage systems. The reasons are simple, by using good, scalable storage software, Nexenta customers can leverage the improvements in component technologies. For example, changing a NexentaStor server’s processor for a faster version, a new motherboard, or adding memory is as simple and straightforward as upgrading a compute server. NexentaStor licensing is based on total storage, not the number or speed of the processors nor the amount of RAM. New or faster network interfaces can be added to improve client performance without incurring additional storage licensing costs.\n\nNexentaStor systems can increase their performance over time, cost-effectively, incrementally, and efficiently over their lifetime.\n
NexentaStor system building blocks include persistent storage devices. Many different block devices are supported, providing an easy migration path from legacy storage. New technologies are easily added to existing NexentaStor systems, protecting your investment against technology obsolescence.\n\nObviously, not all hardware choices are fast. The cost, performance, and reliability of system components has a significant impact on the overall system performance and dependability. This flexibility offers optimization options unparalleled in the industry.\n
The key to the NexentaStor system’s optimization is the Hybrid Storage Pool (HSP). Main memory or DRAM is used as an Adaptive Replacement Cache (ARC) efficiently storing both frequently used and recently used data. \nThe Separate Intent Log is used to optimize latency of synchronous write workloads, such as NFS. The log satisfies synchronous write semantics while the transactional object store optimizes and allocates space on the main pool storage. The log does not need to be very large, size according to the amount of data expected to be written in 30 seconds.\n\nA level-2 ARC or cache device can be used to cost-effectively grow the size of the ARC. For large, read-intensive workloads, the cost-per-gigabyte of SSDs is lower than main memory DRAM. Excellent read system performance can be achieved using modestly priced SSDs.\n\nThe main pool performance can be less critical when the system is configured with enough RAM when the log and L2ARC devices are fast.\n
The Working Set Size (WSS) is used to describe the amount of space most commonly used for applications or workloads. The use of WSS to describe performance is becoming increasingly useful as the size of disks increases. In many cases, systems are configured with 10s or 100s of Terabytes of storage, while only a small fraction of the space is used at any given time. This fraction is the working set size.\n