0
Lustre 1.8.9 - 2.x Client Performance
Comparison and Tuning!
John Fragalla!
HPC Principal Architect!
Xyratex, A Seagate Co...
• Benchmark Setup!
• IOR Parameters and Settings!
• Lustre Clients and Methodology!
• Single Thread Performance!
• Through...
• Storage Architecture!
– A Cluster 6000 SSU with GridRaid OSTs!
•  Rated Storage Performance ≥ 6 GB/s Read or Write with ...
• Used mpirun to execute IOR with --byslot distribution!
• IOR Parameters that were constant:!
–  -F : File Per Process!
–...
• Compared the following clients:!
– 1.8.9, 2.1.6, 2.4.3, and 2.5.1!
• Collected raw performance data using the following ...
Single Thread Write Performance!
0
200
400
600
800
1,000
1,200
32/128 64/256 128/512 256/1024
MB/s
Max PRCs / Max Dirty MB...
Single Thread Read Performance!
-
200.00
400.00
600.00
800.00
1,000.00
1,200.00
1,400.00
32/128 64/256 128/512 256/1024
MB...
Impact on Single Thread Performance when
Checksums were Enabled!
 ! Checksums Impact - Single Thread!
 Version!  ! Reads! ...
1.8.9 Client Performance Results!
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
1 2 4 8 8 8
4 8 16 32 64 96
MB/s
Nodes and T...
2.1.6 Client Performance Results!
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
1 2 4 8 8 8
4 8 16 32 64 96
MB/s
N...
2.4.3 Client Performance Results!
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
1 2 4 8 8 8
4 8 16 32 64 96
MB/s
Nodes and T...
2.5.1 Client Performance Results!
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
1 2 4 8 8 8
4 8 16 32 64 96
MB/s
Nodes and T...
Average Negative Impact on Performance
when Checksums Enabled!
Checksums Impact (max_rpcs_in_flight/max_dirty_mb)
Reads
32...
1.8.9, 2.4.3, 2.5.1 Overall Write Comparison!
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
1 2 4 8 8 8
4 8 16 32 64 96
MB/s...
1.8.9, 2.4.3, 2.5.1 Overall Read Comparison!
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
1 2 4 8 8 8
4 8 16 32 64 96...
Maximizing Performance for Each Client!
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
32 32 96 96 32 32 32 32
Read 32/...
• In general, 1.8.9 Client performed the same regardless of
client settings!
• 2.4.3 and 2.5.1 followed the same performan...
• 1.8.9 single thread performance results are the highest, but
2.4.3 and 2.5.1 improved over previous 2.x client versions!...
Thank You
John_Fragalla@xyratex.com
Upcoming SlideShare
Loading in...5
×

Lustre client performance comparison and tuning (1.8.x to 2.x)

1,406

Published on

In this presentation from the Lustre User Group 2014, John Fragalla from Xyratex presents: Lustre Client Performance Comparison and Tuning (1.8.x to 2.x).

Watch the video presentation: http://wp.me/p3RLHQ-bJ3
Learn more: http://www.opensfs.org/lug-2014-agenda

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,406
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
30
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Lustre client performance comparison and tuning (1.8.x to 2.x)"

  1. 1. Lustre 1.8.9 - 2.x Client Performance Comparison and Tuning! John Fragalla! HPC Principal Architect! Xyratex, A Seagate Company! ! Lustre User Group Conference! April 8-10, 2014!
  2. 2. • Benchmark Setup! • IOR Parameters and Settings! • Lustre Clients and Methodology! • Single Thread Performance! • Throughput Performance Results and Data! • Summary! Agenda! !
  3. 3. • Storage Architecture! – A Cluster 6000 SSU with GridRaid OSTs! •  Rated Storage Performance ≥ 6 GB/s Read or Write with IOR! – InfiniBand FDR Interconnect! – Xyratex Lustre 2.1.0.x4-74! • Client Hardware! – 8 Clients, each configured with QDR IB, 48GB Memory and 12 Cores! – Scientific Linux 6.5 with Stock OFED! Benchmark Setup!
  4. 4. • Used mpirun to execute IOR with --byslot distribution! • IOR Parameters that were constant:! –  -F : File Per Process! –  -B : Direct IO Operation! –  -t 64m: 64m Transfer Size per Task! –  -b: 1024g for Single Thread and 512g for multiple tasks! –  -D: stonewall option, write for 4 minutes, and read for 2 minutes! • Lustre Settings! – Stripe Count of 1! – Stripe Size of 1m! IOR Parameters and Settings!
  5. 5. • Compared the following clients:! – 1.8.9, 2.1.6, 2.4.3, and 2.5.1! • Collected raw performance data using the following client settings with and without checksums enabled! – max_rpcs_in flight / max_dirty_mb! •  32 / 128! •  64 / 256! •  128 / 512! •  256 / 1024! Lustre Clients!
  6. 6. Single Thread Write Performance! 0 200 400 600 800 1,000 1,200 32/128 64/256 128/512 256/1024 MB/s Max PRCs / Max Dirty MB Single Thread Client Write Performance - Checksums Disabled 1.8.9 CS=0 Write (MB/s) 2.1.6 CS=0 Write (MB/s) 2.4.3 CS=0 Write (MB/s) 2.5.1 CS=0 Write (MB/s)
  7. 7. Single Thread Read Performance! - 200.00 400.00 600.00 800.00 1,000.00 1,200.00 1,400.00 32/128 64/256 128/512 256/1024 MB/s Max PRCs / Max Dirty MB Single Thread Client Read Performance Checksums Disabled 1.8.9 CS=0 Read (MB/s) 2.1.6 CS=0 Read (MB/s) 2.4.3 CS=0 Read (MB/s) 2.5.1 CS=0 Read (MB/s)
  8. 8. Impact on Single Thread Performance when Checksums were Enabled!  ! Checksums Impact - Single Thread!  Version!  ! Reads! Writes! 1.8.9! Performance Impact Difference (MB/s)! 217.80 ! 336.84 ! Percentage Impact Reduced! 19%! 35%! 2.1.6! Performance Impact Difference (MB/s)! 37.70 ! 27.81 ! Percentage Impact Reduced! 3%! 6%! 2.4.3! Performance Impact Difference (MB/s)! 45.51 ! 9.14 ! Percentage Impact Reduced! 6%! 1%! 2.5.1! Performance Impact Difference (MB/s)! 44.91 ! 6.50 ! Percentage Impact Reduced! 6%! 1%!
  9. 9. 1.8.9 Client Performance Results! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 1.8.9 Writes - Checksums Disabled Write 32/128/0 Write 64/256/0 Write 128/512/0 Write 256/1024/0 - 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 1.8.9 Reads - Checksums Disabled Read 32/128/0 Read 64/256/0 Read 128/512/0 Read 256/1024/0 Key: max_rpcs_in_flight / max_dirty_mb / checksums
  10. 10. 2.1.6 Client Performance Results! 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.1.6 Writes - Checksums Disabled Write 32/128/0 Write 64/256/0 Write 128/512/0 Write 256/1024/0 0 1,000 2,000 3,000 4,000 5,000 6,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.1.6 Reads - Checksums Disabled Read 32/128/0 Read 64/256/0 Read 128/512/0 Read 256/1024/0 Key: max_rpcs_in_flight / max_dirty_mb / checksums
  11. 11. 2.4.3 Client Performance Results! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.4.3 Writes - Checksums Disabled Write 32/128/0 Write 64/256/0 Write 128/512/0 Write 256/1024/0 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.4.3 Reads - Checksums Disabled Read 32/128/0 Read 64/256/0 Read 128/512/0 Read 256/1024/0 Key: max_rpcs_in_flight / max_dirty_mb / checksums
  12. 12. 2.5.1 Client Performance Results! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.5.1 Writes - Checksums Disabled Write 32/128/0 Write 64/256/0 Write 128/512/0 Write 256/1024/0 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.5.1 Reads - Checksums Disabled Read 32/128/0 Read 64/256/0 Read 128/512/0 Read 256/1024/0 Key: max_rpcs_in_flight / max_dirty_mb / checksums
  13. 13. Average Negative Impact on Performance when Checksums Enabled! Checksums Impact (max_rpcs_in_flight/max_dirty_mb) Reads 32/128 Write 32/128 Read 64/256 Write 65/256 Read 128/512 Write 128/512 Read 256/1024 Write 256/1024 1.8.9 Average Impact Difference (MB/s) 203.40 398.09 3.12 434.41 71.03 523.17 31.19 502.31 Average Impact Percentage 2% 10% -1% 10% 0% 13% 0% 12% 2.1.6 Average Impact Difference (MB/s) 541.64 269.93 446.17 196.06 445.79 189.33 453.34 180.07 Average Impact Percentage 22% 12% 19% 9% 18% 8% 9% 8% 2.4.3 Average Impact Difference (MB/s) 333.18 94.53 277.32 77.16 380.47 194.21 326.76 315.71 Average Impact Percentage 7% 3% 5% 2% 7% 4% 5% 6% 2.5.1 Average Impact Difference (MB/s) 486.11 161.09 245.24 154.50 366.05 256.85 346.78 423.12 Average Impact Percentage 12% 5% 5% 4% 6% 6% 6% 9% Average results using 4-96 Threads across 1-8 Clients
  14. 14. 1.8.9, 2.4.3, 2.5.1 Overall Write Comparison! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 1.8.9, 2.4.3, 2.5.1 Writes Comparison - Checksums Disabled 1.8.9 Write 32/128/0 1.8.9 Write 64/256/0 1.8.9 Write 128/512/0 1.8.9 Write 256/1024/0 2.4.3 Write 32/128/0 2.4.3 Write 64/256/0 2.4.3 Write 128/512/0 2.4.3 Write 256/1024/0 2.5.1 Write 32/128/0 2.5.1 Write 64/256/0 2.5.1 Write 128/512/0 2.5.1 Write 256/1024/0
  15. 15. 1.8.9, 2.4.3, 2.5.1 Overall Read Comparison! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 1.8.9, 2.4.3, 2.5.1 Reads Comparison - Checksums Disabled 1.8.9 Read 32/128/0 1.8.9 Read 64/256/0 1.8.9 Read 128/512/0 1.8.9 Read 256/1024/0 2.4.3 Read 32/128/0 2.4.3 Read 64/256/0 2.4.3 Read 128/512/0 2.4.3 Read 256/1024/0 2.5.1 Read 32/128/0 2.5.1 Read 64/256/0 2.5.1 Read 256/1024/0 2.5.1 Read 256/1024/0
  16. 16. Maximizing Performance for Each Client! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 32 32 96 96 32 32 32 32 Read 32/128/0 Write 32/128/0 Read 128/512/0 Write 128/512/0 Read 256/1024/0Write 256/1024/0Read 256/1024/0Write 256/1024/0 1.8.9 2.1.6 2.4.3 2.5.1 MB/s Client Reads and Writes RPCs/Dirty_MB/Checksums Max Performance Comparison - Checksums Disabled
  17. 17. • In general, 1.8.9 Client performed the same regardless of client settings! • 2.4.3 and 2.5.1 followed the same performance curve! – Clients settings need to be increased to achieve maximum storage throughput! – Both max_rpcs_in_flight and max_dirty_mb need to be increased to at lest 256! •  Anything less than 256 will result in less than optimal Storage performance! • The rule of thumb: max_dirty_mb = max_rpcs_in_flight * 4 
 is not holding true with Client versions 2.4.3 and 2.5.1! Interesting Results Analyzing Raw Data!
  18. 18. • 1.8.9 single thread performance results are the highest, but 2.4.3 and 2.5.1 improved over previous 2.x client versions! • With the right client settings, 2.4.3 and 2.5.1 client versions can maximize storage throughput, along with 1.8.9 clients! • 2.1.6 Client underperformed, regardless of client tuning! • Checksums impact varied with the number of threads, but, on average, not a “big” performance impact! – Biggest impact on performance with Checksums enabled was on the Single Thread tests! • 2.4.3 and 2.5.1 performance results were similar across all threads and client tuning parameters! Summary!
  19. 19. Thank You John_Fragalla@xyratex.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×