SlideShare a Scribd company logo
Lustre 1.8.9 - 2.x Client Performance
Comparison and Tuning!
John Fragalla!
HPC Principal Architect!
Xyratex, A Seagate Company!
!
Lustre User Group Conference!
April 8-10, 2014!
• Benchmark Setup!
• IOR Parameters and Settings!
• Lustre Clients and Methodology!
• Single Thread Performance!
• Throughput Performance Results and Data!
• Summary!
Agenda!
!
• Storage Architecture!
– A Cluster 6000 SSU with GridRaid OSTs!
•  Rated Storage Performance ≥ 6 GB/s Read or Write with IOR!
– InfiniBand FDR Interconnect!
– Xyratex Lustre 2.1.0.x4-74!
• Client Hardware!
– 8 Clients, each configured with QDR IB, 48GB Memory and
12 Cores!
– Scientific Linux 6.5 with Stock OFED!
Benchmark Setup!
• Used mpirun to execute IOR with --byslot distribution!
• IOR Parameters that were constant:!
–  -F : File Per Process!
–  -B : Direct IO Operation!
–  -t 64m: 64m Transfer Size per Task!
–  -b: 1024g for Single Thread and 512g for multiple tasks!
–  -D: stonewall option, write for 4 minutes, and read for 2
minutes!
• Lustre Settings!
– Stripe Count of 1!
– Stripe Size of 1m!
IOR Parameters and Settings!
• Compared the following clients:!
– 1.8.9, 2.1.6, 2.4.3, and 2.5.1!
• Collected raw performance data using the following client
settings with and without checksums enabled!
– max_rpcs_in flight / max_dirty_mb!
•  32 / 128!
•  64 / 256!
•  128 / 512!
•  256 / 1024!
Lustre Clients!
Single Thread Write Performance!
0
200
400
600
800
1,000
1,200
32/128 64/256 128/512 256/1024
MB/s
Max PRCs / Max Dirty MB
Single Thread Client Write Performance - Checksums
Disabled
1.8.9 CS=0 Write (MB/s)
2.1.6 CS=0 Write (MB/s)
2.4.3 CS=0 Write (MB/s)
2.5.1 CS=0 Write (MB/s)
Single Thread Read Performance!
-
200.00
400.00
600.00
800.00
1,000.00
1,200.00
1,400.00
32/128 64/256 128/512 256/1024
MB/s
Max PRCs / Max Dirty MB
Single Thread Client Read Performance
Checksums Disabled
1.8.9 CS=0 Read (MB/s)
2.1.6 CS=0 Read (MB/s)
2.4.3 CS=0 Read (MB/s)
2.5.1 CS=0 Read (MB/s)
Impact on Single Thread Performance when
Checksums were Enabled!
 ! Checksums Impact - Single Thread!
 Version!  ! Reads! Writes!
1.8.9!
Performance Impact Difference (MB/s)! 217.80 ! 336.84 !
Percentage Impact Reduced! 19%! 35%!
2.1.6!
Performance Impact Difference (MB/s)! 37.70 ! 27.81 !
Percentage Impact Reduced! 3%! 6%!
2.4.3!
Performance Impact Difference (MB/s)! 45.51 ! 9.14 !
Percentage Impact Reduced! 6%! 1%!
2.5.1!
Performance Impact Difference (MB/s)! 44.91 ! 6.50 !
Percentage Impact Reduced! 6%! 1%!
1.8.9 Client Performance Results!
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
1 2 4 8 8 8
4 8 16 32 64 96
MB/s
Nodes and Tasks
1.8.9 Writes - Checksums Disabled
Write 32/128/0
Write 64/256/0
Write 128/512/0
Write 256/1024/0
-
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
1 2 4 8 8 8
4 8 16 32 64 96
MB/s
Nodes and Tasks
1.8.9 Reads - Checksums Disabled
Read 32/128/0
Read 64/256/0
Read 128/512/0
Read 256/1024/0
Key:
max_rpcs_in_flight / max_dirty_mb / checksums
2.1.6 Client Performance Results!
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
1 2 4 8 8 8
4 8 16 32 64 96
MB/s
Nodes and Tasks
2.1.6 Writes - Checksums Disabled
Write 32/128/0
Write 64/256/0
Write 128/512/0
Write 256/1024/0
0
1,000
2,000
3,000
4,000
5,000
6,000
1 2 4 8 8 8
4 8 16 32 64 96
MB/s
Nodes and Tasks
2.1.6 Reads - Checksums Disabled
Read 32/128/0
Read 64/256/0
Read 128/512/0
Read 256/1024/0
Key:
max_rpcs_in_flight / max_dirty_mb / checksums
2.4.3 Client Performance Results!
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
1 2 4 8 8 8
4 8 16 32 64 96
MB/s
Nodes and Tasks
2.4.3 Writes - Checksums Disabled
Write 32/128/0
Write 64/256/0
Write 128/512/0
Write 256/1024/0
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
1 2 4 8 8 8
4 8 16 32 64 96
MB/s
Nodes and Tasks
2.4.3 Reads - Checksums Disabled
Read 32/128/0
Read 64/256/0
Read 128/512/0
Read 256/1024/0
Key:
max_rpcs_in_flight / max_dirty_mb / checksums
2.5.1 Client Performance Results!
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
1 2 4 8 8 8
4 8 16 32 64 96
MB/s
Nodes and Tasks
2.5.1 Writes - Checksums Disabled
Write 32/128/0
Write 64/256/0
Write 128/512/0
Write 256/1024/0
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
1 2 4 8 8 8
4 8 16 32 64 96
MB/s
Nodes and Tasks
2.5.1 Reads - Checksums Disabled
Read 32/128/0
Read 64/256/0
Read 128/512/0
Read 256/1024/0
Key:
max_rpcs_in_flight / max_dirty_mb / checksums
Average Negative Impact on Performance
when Checksums Enabled!
Checksums Impact (max_rpcs_in_flight/max_dirty_mb)
Reads
32/128
Write
32/128
Read
64/256
Write
65/256
Read
128/512
Write
128/512
Read
256/1024
Write
256/1024
1.8.9
Average Impact
Difference (MB/s)
203.40 398.09 3.12 434.41 71.03 523.17 31.19 502.31
Average Impact
Percentage
2% 10% -1% 10% 0% 13% 0% 12%
2.1.6
Average Impact
Difference (MB/s)
541.64 269.93 446.17 196.06 445.79 189.33 453.34 180.07
Average Impact
Percentage
22% 12% 19% 9% 18% 8% 9% 8%
2.4.3
Average Impact
Difference (MB/s)
333.18 94.53 277.32 77.16 380.47 194.21 326.76 315.71
Average Impact
Percentage
7% 3% 5% 2% 7% 4% 5% 6%
2.5.1
Average Impact
Difference (MB/s)
486.11 161.09 245.24 154.50 366.05 256.85 346.78 423.12
Average Impact
Percentage
12% 5% 5% 4% 6% 6% 6% 9%
Average results using 4-96 Threads across 1-8 Clients
1.8.9, 2.4.3, 2.5.1 Overall Write Comparison!
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
1 2 4 8 8 8
4 8 16 32 64 96
MB/s
Nodes and Tasks
1.8.9, 2.4.3, 2.5.1 Writes Comparison - Checksums Disabled
1.8.9 Write 32/128/0
1.8.9 Write 64/256/0
1.8.9 Write 128/512/0
1.8.9 Write 256/1024/0
2.4.3 Write 32/128/0
2.4.3 Write 64/256/0
2.4.3 Write 128/512/0
2.4.3 Write 256/1024/0
2.5.1 Write 32/128/0
2.5.1 Write 64/256/0
2.5.1 Write 128/512/0
2.5.1 Write 256/1024/0
1.8.9, 2.4.3, 2.5.1 Overall Read Comparison!
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
1 2 4 8 8 8
4 8 16 32 64 96
MB/s
Nodes and Tasks
1.8.9, 2.4.3, 2.5.1 Reads Comparison - Checksums Disabled
1.8.9 Read 32/128/0
1.8.9 Read 64/256/0
1.8.9 Read 128/512/0
1.8.9 Read 256/1024/0
2.4.3 Read 32/128/0
2.4.3 Read 64/256/0
2.4.3 Read 128/512/0
2.4.3 Read 256/1024/0
2.5.1 Read 32/128/0
2.5.1 Read 64/256/0
2.5.1 Read 256/1024/0
2.5.1 Read 256/1024/0
Maximizing Performance for Each Client!
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
32 32 96 96 32 32 32 32
Read 32/128/0 Write 32/128/0 Read 128/512/0 Write 128/512/0 Read 256/1024/0Write 256/1024/0Read 256/1024/0Write 256/1024/0
1.8.9 2.1.6 2.4.3 2.5.1
MB/s
Client
Reads and Writes
RPCs/Dirty_MB/Checksums
Max Performance Comparison - Checksums Disabled
• In general, 1.8.9 Client performed the same regardless of
client settings!
• 2.4.3 and 2.5.1 followed the same performance curve!
– Clients settings need to be increased to achieve maximum
storage throughput!
– Both max_rpcs_in_flight and max_dirty_mb need to be
increased to at lest 256!
•  Anything less than 256 will result in less than optimal Storage
performance!
• The rule of thumb: max_dirty_mb = max_rpcs_in_flight * 4 

is not holding true with Client versions 2.4.3 and 2.5.1!
Interesting Results Analyzing Raw Data!
• 1.8.9 single thread performance results are the highest, but
2.4.3 and 2.5.1 improved over previous 2.x client versions!
• With the right client settings, 2.4.3 and 2.5.1 client versions
can maximize storage throughput, along with 1.8.9 clients!
• 2.1.6 Client underperformed, regardless of client tuning!
• Checksums impact varied with the number of threads, but,
on average, not a “big” performance impact!
– Biggest impact on performance with Checksums enabled
was on the Single Thread tests!
• 2.4.3 and 2.5.1 performance results were similar across all
threads and client tuning parameters!
Summary!
Thank You
John_Fragalla@xyratex.com

More Related Content

Similar to Lustre client performance comparison and tuning (1.8.x to 2.x)

Plextor mSATA M5M sales kits
Plextor mSATA M5M sales kitsPlextor mSATA M5M sales kits
Plextor mSATA M5M sales kits
Maarten Souren
 
Micron vPAK 2pp Final no Bleed_TD03
Micron vPAK 2pp Final no Bleed_TD03Micron vPAK 2pp Final no Bleed_TD03
Micron vPAK 2pp Final no Bleed_TD03
Peyman Blumstengel
 
stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4
Gaurav "GP" Pal
 

Similar to Lustre client performance comparison and tuning (1.8.x to 2.x) (20)

Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Plextor mSATA M5M sales kits
Plextor mSATA M5M sales kitsPlextor mSATA M5M sales kits
Plextor mSATA M5M sales kits
 
Micron vPAK 2pp Final no Bleed_TD03
Micron vPAK 2pp Final no Bleed_TD03Micron vPAK 2pp Final no Bleed_TD03
Micron vPAK 2pp Final no Bleed_TD03
 
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
 
LUG 2014
LUG 2014LUG 2014
LUG 2014
 
Day 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConfDay 2 General Session Presentations RedisConf
Day 2 General Session Presentations RedisConf
 
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware R...
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
 
Toshiba Storage Protfolio 2014
Toshiba Storage Protfolio 2014Toshiba Storage Protfolio 2014
Toshiba Storage Protfolio 2014
 
Alibaba cloud benchmarking report ecs rds limton xavier
Alibaba cloud benchmarking report ecs  rds limton xavierAlibaba cloud benchmarking report ecs  rds limton xavier
Alibaba cloud benchmarking report ecs rds limton xavier
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
 
5 Things You Need to Know About Enterprise Fl
 5 Things You Need to Know About Enterprise Fl 5 Things You Need to Know About Enterprise Fl
5 Things You Need to Know About Enterprise Fl
 
Accelerating SSD Performance with HLNAND
Accelerating SSD Performance with HLNANDAccelerating SSD Performance with HLNAND
Accelerating SSD Performance with HLNAND
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
 
VMware VSAN Technical Deep Dive - March 2014
VMware VSAN Technical Deep Dive - March 2014VMware VSAN Technical Deep Dive - March 2014
VMware VSAN Technical Deep Dive - March 2014
 
Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O Performance
 
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
 
stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4
 
Stabilizing Ceph
Stabilizing CephStabilizing Ceph
Stabilizing Ceph
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 

Lustre client performance comparison and tuning (1.8.x to 2.x)

  • 1. Lustre 1.8.9 - 2.x Client Performance Comparison and Tuning! John Fragalla! HPC Principal Architect! Xyratex, A Seagate Company! ! Lustre User Group Conference! April 8-10, 2014!
  • 2. • Benchmark Setup! • IOR Parameters and Settings! • Lustre Clients and Methodology! • Single Thread Performance! • Throughput Performance Results and Data! • Summary! Agenda! !
  • 3. • Storage Architecture! – A Cluster 6000 SSU with GridRaid OSTs! •  Rated Storage Performance ≥ 6 GB/s Read or Write with IOR! – InfiniBand FDR Interconnect! – Xyratex Lustre 2.1.0.x4-74! • Client Hardware! – 8 Clients, each configured with QDR IB, 48GB Memory and 12 Cores! – Scientific Linux 6.5 with Stock OFED! Benchmark Setup!
  • 4. • Used mpirun to execute IOR with --byslot distribution! • IOR Parameters that were constant:! –  -F : File Per Process! –  -B : Direct IO Operation! –  -t 64m: 64m Transfer Size per Task! –  -b: 1024g for Single Thread and 512g for multiple tasks! –  -D: stonewall option, write for 4 minutes, and read for 2 minutes! • Lustre Settings! – Stripe Count of 1! – Stripe Size of 1m! IOR Parameters and Settings!
  • 5. • Compared the following clients:! – 1.8.9, 2.1.6, 2.4.3, and 2.5.1! • Collected raw performance data using the following client settings with and without checksums enabled! – max_rpcs_in flight / max_dirty_mb! •  32 / 128! •  64 / 256! •  128 / 512! •  256 / 1024! Lustre Clients!
  • 6. Single Thread Write Performance! 0 200 400 600 800 1,000 1,200 32/128 64/256 128/512 256/1024 MB/s Max PRCs / Max Dirty MB Single Thread Client Write Performance - Checksums Disabled 1.8.9 CS=0 Write (MB/s) 2.1.6 CS=0 Write (MB/s) 2.4.3 CS=0 Write (MB/s) 2.5.1 CS=0 Write (MB/s)
  • 7. Single Thread Read Performance! - 200.00 400.00 600.00 800.00 1,000.00 1,200.00 1,400.00 32/128 64/256 128/512 256/1024 MB/s Max PRCs / Max Dirty MB Single Thread Client Read Performance Checksums Disabled 1.8.9 CS=0 Read (MB/s) 2.1.6 CS=0 Read (MB/s) 2.4.3 CS=0 Read (MB/s) 2.5.1 CS=0 Read (MB/s)
  • 8. Impact on Single Thread Performance when Checksums were Enabled!  ! Checksums Impact - Single Thread!  Version!  ! Reads! Writes! 1.8.9! Performance Impact Difference (MB/s)! 217.80 ! 336.84 ! Percentage Impact Reduced! 19%! 35%! 2.1.6! Performance Impact Difference (MB/s)! 37.70 ! 27.81 ! Percentage Impact Reduced! 3%! 6%! 2.4.3! Performance Impact Difference (MB/s)! 45.51 ! 9.14 ! Percentage Impact Reduced! 6%! 1%! 2.5.1! Performance Impact Difference (MB/s)! 44.91 ! 6.50 ! Percentage Impact Reduced! 6%! 1%!
  • 9. 1.8.9 Client Performance Results! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 1.8.9 Writes - Checksums Disabled Write 32/128/0 Write 64/256/0 Write 128/512/0 Write 256/1024/0 - 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 1.8.9 Reads - Checksums Disabled Read 32/128/0 Read 64/256/0 Read 128/512/0 Read 256/1024/0 Key: max_rpcs_in_flight / max_dirty_mb / checksums
  • 10. 2.1.6 Client Performance Results! 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.1.6 Writes - Checksums Disabled Write 32/128/0 Write 64/256/0 Write 128/512/0 Write 256/1024/0 0 1,000 2,000 3,000 4,000 5,000 6,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.1.6 Reads - Checksums Disabled Read 32/128/0 Read 64/256/0 Read 128/512/0 Read 256/1024/0 Key: max_rpcs_in_flight / max_dirty_mb / checksums
  • 11. 2.4.3 Client Performance Results! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.4.3 Writes - Checksums Disabled Write 32/128/0 Write 64/256/0 Write 128/512/0 Write 256/1024/0 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.4.3 Reads - Checksums Disabled Read 32/128/0 Read 64/256/0 Read 128/512/0 Read 256/1024/0 Key: max_rpcs_in_flight / max_dirty_mb / checksums
  • 12. 2.5.1 Client Performance Results! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.5.1 Writes - Checksums Disabled Write 32/128/0 Write 64/256/0 Write 128/512/0 Write 256/1024/0 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 2.5.1 Reads - Checksums Disabled Read 32/128/0 Read 64/256/0 Read 128/512/0 Read 256/1024/0 Key: max_rpcs_in_flight / max_dirty_mb / checksums
  • 13. Average Negative Impact on Performance when Checksums Enabled! Checksums Impact (max_rpcs_in_flight/max_dirty_mb) Reads 32/128 Write 32/128 Read 64/256 Write 65/256 Read 128/512 Write 128/512 Read 256/1024 Write 256/1024 1.8.9 Average Impact Difference (MB/s) 203.40 398.09 3.12 434.41 71.03 523.17 31.19 502.31 Average Impact Percentage 2% 10% -1% 10% 0% 13% 0% 12% 2.1.6 Average Impact Difference (MB/s) 541.64 269.93 446.17 196.06 445.79 189.33 453.34 180.07 Average Impact Percentage 22% 12% 19% 9% 18% 8% 9% 8% 2.4.3 Average Impact Difference (MB/s) 333.18 94.53 277.32 77.16 380.47 194.21 326.76 315.71 Average Impact Percentage 7% 3% 5% 2% 7% 4% 5% 6% 2.5.1 Average Impact Difference (MB/s) 486.11 161.09 245.24 154.50 366.05 256.85 346.78 423.12 Average Impact Percentage 12% 5% 5% 4% 6% 6% 6% 9% Average results using 4-96 Threads across 1-8 Clients
  • 14. 1.8.9, 2.4.3, 2.5.1 Overall Write Comparison! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 1.8.9, 2.4.3, 2.5.1 Writes Comparison - Checksums Disabled 1.8.9 Write 32/128/0 1.8.9 Write 64/256/0 1.8.9 Write 128/512/0 1.8.9 Write 256/1024/0 2.4.3 Write 32/128/0 2.4.3 Write 64/256/0 2.4.3 Write 128/512/0 2.4.3 Write 256/1024/0 2.5.1 Write 32/128/0 2.5.1 Write 64/256/0 2.5.1 Write 128/512/0 2.5.1 Write 256/1024/0
  • 15. 1.8.9, 2.4.3, 2.5.1 Overall Read Comparison! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 1 2 4 8 8 8 4 8 16 32 64 96 MB/s Nodes and Tasks 1.8.9, 2.4.3, 2.5.1 Reads Comparison - Checksums Disabled 1.8.9 Read 32/128/0 1.8.9 Read 64/256/0 1.8.9 Read 128/512/0 1.8.9 Read 256/1024/0 2.4.3 Read 32/128/0 2.4.3 Read 64/256/0 2.4.3 Read 128/512/0 2.4.3 Read 256/1024/0 2.5.1 Read 32/128/0 2.5.1 Read 64/256/0 2.5.1 Read 256/1024/0 2.5.1 Read 256/1024/0
  • 16. Maximizing Performance for Each Client! 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 32 32 96 96 32 32 32 32 Read 32/128/0 Write 32/128/0 Read 128/512/0 Write 128/512/0 Read 256/1024/0Write 256/1024/0Read 256/1024/0Write 256/1024/0 1.8.9 2.1.6 2.4.3 2.5.1 MB/s Client Reads and Writes RPCs/Dirty_MB/Checksums Max Performance Comparison - Checksums Disabled
  • 17. • In general, 1.8.9 Client performed the same regardless of client settings! • 2.4.3 and 2.5.1 followed the same performance curve! – Clients settings need to be increased to achieve maximum storage throughput! – Both max_rpcs_in_flight and max_dirty_mb need to be increased to at lest 256! •  Anything less than 256 will result in less than optimal Storage performance! • The rule of thumb: max_dirty_mb = max_rpcs_in_flight * 4 
 is not holding true with Client versions 2.4.3 and 2.5.1! Interesting Results Analyzing Raw Data!
  • 18. • 1.8.9 single thread performance results are the highest, but 2.4.3 and 2.5.1 improved over previous 2.x client versions! • With the right client settings, 2.4.3 and 2.5.1 client versions can maximize storage throughput, along with 1.8.9 clients! • 2.1.6 Client underperformed, regardless of client tuning! • Checksums impact varied with the number of threads, but, on average, not a “big” performance impact! – Biggest impact on performance with Checksums enabled was on the Single Thread tests! • 2.4.3 and 2.5.1 performance results were similar across all threads and client tuning parameters! Summary!