Xtw01t7v021711 cluster

1,082 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Xtw01t7v021711 cluster

  1. 1. © 2006 IBM Corporation This presentation is intended for the education of IBM and Business Partner sales personnel. It should not be distributed to customers. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation Introduction to Intelligent Clusters XTW01 Topic 7
  2. 2. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 2 Course Overview The objectives of this course of study are: >Describe a high-performance computing cluster >List the business goals that Intelligent Clusters addresses >Identify three core Intelligent Clusters components >List the high-speed networking options available in Intelligent Clusters >List three software tools used in Clusters >Describe Cluster benchmarking
  3. 3. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 3 Topic Agenda >*Commodity Clusters* >Overview of Intelligent Clusters >Cluster Hardware >Cluster Networking >Cluster Management, Software Stack, and Benchmarking
  4. 4. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 4 >Clusters are comprised of standard, commodity components that could be used separately in other types of computing configurations  Compute servers – a.k.a. nodes  High-speed networking adapters and switches  Local and/or external storage  A commodity operating system such as Linux  Systems management software  Middleware libraries and Application software >Clusters enable “Commodity-based supercomputing” What is a Commodity Cluster? A multi-server system, comprised of interconnected computers and associated networking and storage devices, that are unified via systems management and networking software to accomplish a specific purpose.
  5. 5. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 5 Storage Rack Fiber Network Fibre SAN Switch Storage Nodes Ethernet Switch Management, Storage, SOL and Cluster VLANs Storage VLAN Management, SOL, Cluster VLANs Management Node User/Login Nodes LAN Cluster VLAN High-speed network Switch Message-passing Network User access To management network Compute Node Rack Public VLAN Conceptual View of a Cluster
  6. 6. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 6 Energy Finance Mfg Life Sciences Media Public / Gov’t Seismic Analysis Reservoir Analysis Derivative Analysis Actuarial Analysis Asset Liability Management Portfolio Risk Analysis Statistical Analysis Mechanical/ Electric Design Process Simulation Finite Element Analysis Failure Analysis Drug Discovery Protein Folding Medical Imaging Digital Rendering Collaborative Research Numerical Weather Forecasting High Energy Physics Bandwidth Consumption Gaming Application of Clusters in Industry
  7. 7. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 7 Technology Innovation in HPC >Multi-core enabled systems create new opportunities to advance applications and solutions  Dual and Quad core along with increased density memory designs  “8 way” x86 128GB capable system that begins at less than $10k. >Virtualization is a hot topic for architectures  Possible workload consolidation for cost savings  Power consumption reduced by optimizing system level utilization >Manageability is key to addressing complexity  Effective power/thermal management through SW tools  Virtualization management tools must be integrated into the overall management scheme Multi Core Virtualization Manageability
  8. 8. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 8 Topic Agenda >Commodity Clusters >*Overview of Intelligent Clusters* >Cluster Hardware >Cluster Networking >Cluster Management, Software Stack and Benchmarking
  9. 9. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 9 Approaches to Clustering Roll Your OwnRoll Your Own • Client orders individual components from a variety of vendors, including IBM • Client tests and integrates components or contracts with an integrator • Client must address warranty issues with each vendor BP IntegratedBP Integrated • BP orders servers & storage from IBM and networking from 3rd Party vendors • BP builds and integrates components and delivers to customer • Client must address warranty issues with each vendor IBM Racked andIBM Racked and StackedStacked • Client orders servers & storage in standard rack configurations from IBM • Client integrates IBM racks with 3rd Party components or contracts with IGS or other integrator • Client must address warranty issues with each vendor IntelligentIntelligent ClustersClusters • Client orders integrated cluster solution from IBM, including servers, storage and networking components • IBM delivers factory- built and tested cluster ready to “plug-in” • Client has Single Point of Contact for all warranty issues. Piece PartsPiece Parts Integrated SolutionIntegrated Solution Client bears all risk for sizing, design, integration, deployment and warranty issues Single Vendor responsible for sizing, design, integration, deployment, and all warranty issues IBM Delivers Across the Spectrum
  10. 10. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 10 Blade Servers Disk Storage Storage Networking Fiber Channel iSCSI FcOE Rack-mount Servers Compute Nodes Storage Software ServeRAID IBM TotalStorage® IBM Servers Core Technologies An IBM portfolio of components that have been cluster configured, tested, and work with a defined supporting software stack. •Factory assembled •Onsite Installation •One phone number for support. •Selection of options to customize your configuration including Linux operating system (RHEL or SUSE), xCAT, & GPFS The degree to which a multi-server system exhibits these characteristics determines if it is a cluster: - Dedicated private VLAN - All nodes running same suite of apps - Single point-of-control for: - Software/application distribution - Hardware management - Inter-node communication - Node interdependence - Linux operating system Storage Nodes Management Nodes Processors -Intel® Fiber SAS iSCSI HS21-XM Ethernet 10 GbE 1 GbE InfiniBand 4X – SDR 4X – DDR 4X - QDR Networks What is an IBM System Intelligent Cluster? Out of band Management Terminal Serv. x3550 M3 x3650 M3 HS22 Scale-out Servers iDataPlex dx360 M3 HX5
  11. 11. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 11 IBM HPC Cluster Solution (Intelligent Clusters) HPC Cluster SolutionSystem x Servers (Rack mount, Blades or iDataPlex) GPFS  xCAT  Linux or Windows Switches & Storage Cluster Software IBM or Business Partner adds… + =+ Technical Application (or “Workload”)
  12. 12. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 12 Course Agenda >Commodity Clusters >Overview of Intelligent Clusters >*Cluster Hardware* >Cluster Networking >Cluster Management and Software Stack and Benchmarking
  13. 13. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 13 Intelligent Clusters Overview - Servers IBM System x™ 3550 M3 High performance compute nodes • Dual Socket – 3550 M3 Intel • Integrated System Management IBM System x™ 3650 M3 Mission critical performance • Dual Socket – 3650 M3 Intel • Integrated System Management 2U 1U Active Energy ManagerTM : Power Management at Your Control • HS21-XM/HS22/HX5 Intel Processor-based Blades IBM BladeCenter® with HS21-XM, HS22, and HX5 IBM BladeCenter S Distributed, small office, easy to configure IBM BladeCenter H High performance IBM BladeCenter E Best energy efficiency, best density HS21 XM Extended-memory HS22 General-purpose enterprise Industry-leading performance, reliability and control HX5 Scalable enterprise
  14. 14. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 14 IBM System x iDataPlex PDUs 3U Chassis 2U Chassis Switches iDataPlex Rear Door Heat Exchanger HPC ServerWeb Server Storage Drives & Options I/O TrayStorage Tray
  15. 15. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation Current iDataPlex Server Offerings >Processor: Quad Core Intel Xeon 5500 >Quick Path Interconnect up to 6.4 GT/s >Memory:16 DIMM DDR3 - 128 GB max >Memory Speed: up to 1333 MHz >PCIe: x16 electrical/ x16 mechanical >Chipset: Tylersburg-36D >Last Order Date: December 31, 2010 iDataPlex dx360 M2 High-performance Dual-Socket >Storage: 12 3.5” HDD up to 24 TB per node / 672TB per rack >Proc: 6 or 4 Core Intel Xeon 5600 >Memory: 16 DIMM / 128 GB max >Chipset: Westmere iDataPlex 3U Storage Rich File Intense Dual-Socket >Processor: 6 & 4 Core Intel Xeon 5600 >Quick Path Interconnect up to 6.4 GT/s >Memory:16 DIMM DDR3 - 128 GB max >Memory Speed: up to 1333 MHz >PCIe: x16 electrical/ x16 mechanical >Chipset: Westmere 12 MB cache >Ship Support March 26, 2010 iDataPlex dx360 M3 High-performance Dual-Socket >Processor: 6 & 4 Core Intel Xeon 5600 >2 NVIDIA M1060 or M2050 >Quick Path Interconnect up to 6.4 GT/s >Memory:16 DIMM DDR3 - 128 GB max >Memory Speed: up to 1333 MHz >PCIe: x16 electrical/ x16 mechanical >Chipset: Westmere 12 MB cache >Ship Support August 12, 2010 iDataPlex dx360 M3 Refresh Exa-scale Hybrid CPU + GPU
  16. 16. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation System x iDataPlex dx360 M3 iDataPlex flexibility with better performance, efficiency and more options! 1U Drive Tray 1U Compute Node 3U Storage Chassis Maximize Storage Density 3U, 1 Node Slot & Triple Drive Tray HDD: 12 (3.5” Drives) up to 24TB I/O: PCIe for networking + PCIe for RAID Compute + Storage Balanced Storage and Processing 2U, 1 Node Slot & Drive Tray HDD: up to 5 (3.5”) Compute Intensive Maximum Processing 2U, 2 Compute Nodes 750W N+N Redundant Power Supply 900W Power Supply 1U Dual GPU I/O Tray 550W Power Supply Acceleration Compute + I/O Maximum Component Flexibility 2U, 1 Node Slot I/O: up to 2 PCIe, HDD up to 8 (2.5”) Tailored for Your Business Needs
  17. 17. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation iDataPlex dx360 M3 Refresh >Increased Server efficiency & Westmere enablement  Intel Westmere-EP 4 and 6 core processor support (up to 95 watts)  2 DIMM / Channel @1333MHz with Westmere 95 watt CPU’s  Lower Power (1.35V) DIMM (2GB, 4GB, 8GB) >Expanded I/O performance capabilities  New I/O tray and 3-slot “butterfly” PCIe riser to support 2 GPU + network adapter  Support for NVIDIA Tesla M1060 or “Fermi” M2050 in a 2U Chassis + 4 HDD >Expanded Power Supply Offerings  Optional Redundant 2U Power Supply for Line Feed (AC) and Chassis (DC) protection  High Efficiency power supplies fitted to workload power demands >Storage Performance, Capacity and Flexibility  Simple-Swap SAS, SATA & SSD, 2.5” & 3.5” in any 2U configuration  Increased capacities of 2.5” & 3.5” SAS, SATA and SSD  Increased capacities in 3U Storage Dense to 24TB (with 2TB 3.5” SATA/SAS drives)  6Gbps backplane for performance  Rear PCIe slot enablement in 2U chassis for RAID controller flexibility  Higher capacity/higher performance Solid State Drive controller >Next-Generation Converged Networking  FCoE via 10G Converged Network Adapters, Dual Port 10Gb Ethernet
  18. 18. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation dx360 M3 Refresh - Power Supply Offerings >Maximum Efficiency for lower power requirements  New High Efficiency 550W Power Supply for optimum efficiency in low power configurations  More efficiency by running higher on the power curve >Flexibility to optimize power supply to workload appropriately  550W (non-redundant) for lower power demands  900W (non-redundant) for higher power demands  750W N+N for node and line feed redundancy >Redundant Power Supply option for the iDataPlex chassis  Node-level power protection for smaller clusters, head node, 3U storage-rich, VM & Enterprise  Rack-level line feed redundancy with discreet feeds  Tailor rack-level solutions that require redundant power in some or all nodes  Maintains maximum floor space density with the iDataPlex rack  Graceful shutdown on power supply failure for virtualized environments >Flexibility per chassis to optimize rack power  Power supply is per 2U or 3U chassis  Mix across the rack to maximize flexibility, minimize stranded power 900W HE 550W HE 750W N+N AC 1 AC 2 PS 1 750W Max PS 2 750W Max C A B 750W Total in redundant mode 200-240V only Redundant supply block diagram
  19. 19. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation >Rack level value  Greater density, easier to cool  Flexibility of network topology without compromising density  More density reduces number of racks and power feeds in the data center  Rear Door Heat Exchanger provides the ultimate value in cooling and density dx360 M3 Refresh - Rack GPU Configuration >42 High Performance GPU servers / rack >iDataPlex efficiency drives more density on the floor >In-rack networking will not reduce rack density, regardless of topology required by the customer >Rear Door Heat Exchanger provides further TCO value
  20. 20. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 4- 2.5” SS SAS 300 or 600GB/10K 6Gbps (or SATA, or 3.5”, or SSD…) Infiniband DDR (or QDR, or 10GbE…) NVIDIA M2050 #1 “Fermi” (or M1060,or FX3800, or Fusion IO,… NVIDIA M2050 #2 “Fermi” Or M1060 FX3800, or Fusion IO,…) Server level value > Each server is individually serviceable > Balanced performance for demanding GPU workloads > 6Gbps SAS drives and controller for maximum performance > Service and support for server and GPU from IBM dx360 M3 Refresh - Server GPU Configuration
  21. 21. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 21 Intelligent Clusters Storage Portfolio Summary >Intelligent Clusters BOM consists of the following Storage components  Entry-level DS3000 series disk storage systems  Mid-range DS4000 series disk storage systems  High-end DS5000 series disk storage systems  All standard hard disk drives (SAS/SATA/FC)  Entry-level SAN fabric switches >Majority of the HPC solutions use DS3000/DS4000 series disk storage with IBM GPFS parallel file system software >A small percentage of HPC clusters use entry-level storage (DS3200/DS3300/DS3400/DS3500) >Integrated business solutions (SAP-BWA, Smart Analytics, SoFS) use DS3500 storage (mostly) >Smaller-size custom solutions use DS3000 entry-level storage >A small percentage of special HPC bids use DDN DCS9550 storage
  22. 22. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 22 DS5020 (FC-SAN) DS5000 (FC-SAN) DS3400 (FC-SAN, SAS/SATA) DS3500 (SAS) DS3300 (iSCSI/SAS) EXP3000 Storage Expansion (JBOD) Intelligent Clusters Storage Portfolio (Dec 2008)
  23. 23. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 23 Topic Agenda >Commodity Clusters >Overview of Intelligent Clusters >Cluster Hardware >*Cluster Networking* >Cluster Management Software Stack and Benchmarking
  24. 24. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 24 Cluster Networking >Networking is an integral part of any Cluster system from communication across various devices, including servers and storage, and for cluster management >All servers in the cluster, including login, management, compute, and storage nodes communicate using one or more network fabrics connecting them >Typically clusters have one or more of the following networks  A cluster-wide Management network  A user/campus network through which users login to the cluster and launch jobs  A low-latency, high-bandwidth network such as InfiniBand used for inter-process communication  A Storage network used for communication across the storage nodes (optional)  A Fibre-channel or Ethernet network (in case of iSCSI traffic) used as the Storage network fabric Cluster Network
  25. 25. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation QDR InfiniBand HCA’s QDR InfiniBand Switches 4036 1U 36 ports 12200-36 1U 36 ports InfiniScale IV 1U 36 ports Director Class  InfiniScale IV 10 U 216 ports Director Class InfiniScale IV 29 U 648 ports 12800-180 14 U 432 ports 12800-360 29 U 864 ports ConnectX-2 Dual Port ConnectX-2 Single Port QLE 7340 Single Port 12300-36 1U Managed 36 ports Director Class  InfiniScale IV 6 U 108 ports Director Class  InfiniScale IV 17 U 324 ports  Grid Director4700 18U 324 ports  Grid Director 4200 11U 110-160 ports = New for 10B Release InfiniBand Portfolio - Intelligent Cluster
  26. 26. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation SMC 8848M 1U 48 x 1Gb ports 2 x 10Gb uplink SMC 8126L2 1U 26 1Gb ports Cisco 4948 1U 48 x 1Gb ports 2x 10Gb optional uplink Cisco 2960G-48 48 1Gb ports 1U Low Cost 48 Port Industry Low Cost Premium Brand Alternative Premium Brand (Stackable) Premium Low Cost Industry Low Cost Blade G8000-48 48-1Gb ports 4x10Gb Up 1U Cisco 4900 2U Cisco 10gbE 24 10Gb Ports IBM FCX-48 (Foxhound) 48X 48 1Gb ports 10Gb Up - I- DPX Low Cost 48 Port Added in Oct 10 BOM SMC 8150L2 1U 50 1Gb ports Industry Low Cost Force 10 S60 1U 48 x 1Gb ports Up to 4 x 10Gb opt. uplink Blade G8124 1U 24x SFP+ 10Gb 24-port 10bGb SFP+ Cisco 3750G-48 48 1Gb ports with Stacking 1U Intelligent Cluster Ethernet Portfolio 10G Switches 1G 48 Port with 10G Up 1G 48 Port Switches 1G 24 Port Switches Entry / Leaf / Top of Rack Switches
  27. 27. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation Ethernet Switch Portfolio - iDataPlex SMC 8848M 1U 48 x 1Gb ports 2 x 10Gb uplink Industry Low Cost Cisco 4948E 1U 48 x 1Gb ports 4 x 10Gb optional uplink Premium Brand Added in Oct 10 BOM Blade G8124 1U 24x SFP+ 10Gb 24-port 10bGb SFP+ IBM B24X (TurboIron) 24X 24 10Gb ports I- DPX IBM DCN -24port 10Gb IBM DCN -48port 10Gb IBM B50C 1U (NettIron 48) 48 1Gb ports w/2 10GbE (opt) Low Cost 48 Port IBM FCX-48 (Foxhound) 48X 48 1Gb ports 10Gb Up - I- DPX Blade G8000-48 48-1Gb ports 4x10Gb Up 1U Low Cost 48 Port IBM J48 Juniper EX4200-48 48-1Gb ports 10Gb Up, 2 VC ports I- DPX Premium Brand Alternative Premium Brand (Stackable) Force 10 S60 1U 48 x 1Gb ports 4-10Gb Uplinks 10G Switches 1G 48 Port with 10G Uplinks 1G 24/48 Port Switches Entry / Leaf / Top of Rack Switches
  28. 28. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation Ethernet Switch Portfolio - Intelligent Cluster Core & Aggregate Switches . Cisco 6509-E 15U 9 Slots 384-1Gb ports 32-10Gb ports Chelsio Dual Port T3 SFP+ 10Gbe PCI-E x8 line rate adapter Chelsio Dual port T3 CX4 10Gbe PCI-E x8 line rate adapter Chelsio Dual port T3 10Gbe CFFh High Performance Daughter Card for Blades Mellanox ConnectX 2 EN 10GbE PCI-E x8 line rate adapter Added in Oct 10 BOM 10GbE HPC Adapters IBM B16R (BigIron) 16 Slots 768 -1Gb 256 -10Gb ports IBM B08R (BigIron) 8 Slots 384 -1Gb 32 -10Gb ports Voltaire 8500 12 Slots 15U 288 -10Gb ports All Core Switches & 10GbE Adapters Tested for compatibility with iDataPlex Force 10 E600i 16U 7 slots 633-1Gb ports 112-10Gb ports Force 10 E1200i 21U 14 slots 1260-1Gb ports 224-10Gb ports Juniper 8216 21U 16 Slots 768-1Gb ports 128 -10Gb ports Juniper 8208 14U 8 Slots 384-1Gb ports 64-10Gb ports Core Switches & Adapters
  29. 29. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 29 High-speed Networking >Many HPC applications are sensitive to network bandwidth and latency for performance >Primary choices for high-speed networking for Clusters  InfiniBand  10 Gigabit Ethernet (emerging) >InfiniBand  InfiniBand is an industry standard low-latency, high-bandwidth server interconnect, ideal to carry multiple traffic types (clustering, communications, storage, management) over a single connection >10Gigabit Ethernet  10GbE or 10GigE is an IEEE Ethernet standard 802.3ae, which defines Ethernet technology with data rate of 10 Gbits/sec  Follow-on to 1Gigabit Ethernet technology
  30. 30. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 30 InfiniBand >An industry standard low-latency, high-bandwidth server interconnect >Ideal to carry multiple traffic types (clustering, communications, storage, management) over a single physical connection >Serial I/O interconnect architecture operating at a base speed of 5Gb/s in each direction with DDR and 10Gb/s in each direction with QDR >Provides highest node-to-node bandwidth available today of 40Gb/s bidirectional with Quadruple Data Rate (QDR) technology >Lowest end-to-end messaging latency in micro seconds (1.2-1.5 µsec) >Wide-industry adoption and multiple vendors (Mellanox, Voltaire, QLogic, etc.) >Open source drivers and libraries are available for users (OFED) Lanes SDR - 2.5Gb/s DDR - 5Gb/s QDR - 10Gb/s EDR - 20Gb/s 1x (2.5 + 2.5) Gb/s (5 + 5) Gb/s (10 + 10) Gb/s (20 + 20) Gb/s 4x (10 + 10) Gb/s (20 + 20) Gb/s (40 + 40) Gb/s (80 + 80) Gb/s 8x (20 + 20) Gb/s (40 + 40) Gb/s (80 + 80) Gb/s (160 + 160) Gb/s 12x (30 + 30) Gb/s (60 + 60) Gb/s (120 + 120) Gb/s (240 + 240) Gb/s InfiniBand Peak Bi-directional Bandwidth Table
  31. 31. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation QDR InfiniBand HCA’s QDR InfiniBand Switches 4036 1U 36 ports 12200-36 1U 36 ports InfiniScale IV 1U 36 ports Director Class InfiniScale IV 10 U 216 ports Director Class InfiniScale IV 29 U 648 ports 12800-180 14 U 432 ports 12800-360 29 U 864 ports ConnectX-2 Dual Port ConnectX-2 Single Port QLE 7340 Single Port 12300-36 1U Managed 36 ports Director Class InfiniScale IV 6 U 108 ports Director Class InfiniScale IV 17 U 324 ports Grid Director4700 18U 324 ports Grid Director 4200 11U 110-160 ports New for 10B Release InfiniBand Portfolio - Intelligent Cluster
  32. 32. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 32 10 Gigabit Ethernet >10GbE or 10GigE is an IEEE Ethernet standard 802.3ae, which defines Ethernet technology with data rates of 10 Gbits/sec >Enables applications to take advantage of 10Gbps Ethernet >Requires no changes to the application code >High-speed interconnect choice for “loosely-coupled” HPC applications >Wide industry support for 10GbE technology >Growing user adoption for Data Center Ethernet (DCE) and Fibre Channel Over Ethernet (FCoE) technologies >Intelligent Clusters supports 10GbE technologies for both node-level and switch- level, providing multiple vendor choices for adapters and switches (BNT, SMC, Force10, Brocade, Cisco, Chelsio, etc.)
  33. 33. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 33 Topic Agenda >Commodity Clusters >Overview of Intelligent Clusters >Cluster Hardware >Cluster Networking >*Cluster Management, Software Stack and Benchmarking*
  34. 34. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 34 Cluster Management - xCAT >xCAT - Extreme Cluster (Cloud) Administration Toolkit  Open Source Linux/AIX/Windows Scale-out Cluster Management Solution  Leverage best practices for deploying and managing Clusters at scale  Scripts only (no compiled code)  Community requirements driven >xCAT Capabilities  Remote Hardware Control - Power, Reset, Vitals, Inventory, Event Logs, SNMP alert processing  Remote Console Management - Serial Console, SOL, Logging / Video Console (no logging)  Remote OS Boot Target Control - Local/SAN Boot, Network Boot, iSCSI Boot  Remote Automated Unattended Network Installation  For more information on xCAT go to http://xcat.sf.net
  35. 35. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 35 Cluster Software Stack >Provides fast and reliable access to common set of file data from a single computer to hundreds of systems >Brings together multiple systems to create a truly scalable cloud storage infrastructure >GPFS-managed storage improves disk utilization and reduces footprint energy consumption and management efforts >GPFS removes client-server and SAN file system access bottlenecks >All applications and users share all disks with dynamic re-provisioning capability SAN GPFS SAN LAN TECHNOLOGY: >OS Support  Linux (on POWER and x86)  AIX  Windows >Interconnect Support (w/ TCP/IP)  1GbE and 10 GbE  Infiniband (RDMA in addition to IPoIB)  Myrinet  IBM HPS High performance scalable file management solution IBM GPFS - General Parallel File System
  36. 36. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 36 What is GPFS ? >IBM’s shared disk, parallel cluster file system. >Product available on pSeries/xSeries clusters with AIX/Linux >Used on many of the largest supercomputers in the world >Cluster: 2400+ nodes, fast reliable communication, common admin domain. >Shared disk: all data and metadata on disk accessible from any node through disk I/O interface. >Parallel: data and metadata flows from all of the nodes to all of the disks in parallel. GPFS File System Nodes Switching fabric (System or storage area network) Shared disks (SAN-attached or network block device) For more information on IBM GPFS, go to http://www-03.ibm.com/systems/clusters/software/gpfs/index.html
  37. 37. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 37 >Resource Managers/Schedulers queue, validate, manage, load balance, and launch user programs/jobs. >Torque - Portable Batch System (free)  Works with Maui Scheduler (free) >LSF--Load Sharing Facility (commercial) >Sun Grid Engine (free) >Condor (free) >MOAB Cluster Suite (commercial) >Load Leveler (commercial scheduler from IBM) Resource Managers/Schedulers Job SchedulerJob Scheduler User 1User 1 Resource ManagerResource Manager Node 3Node 3Node 2Node 2Node 1Node 1 Node NNode N Job Queue User 2User 2 User NUser N
  38. 38. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 38 Messaging Passing Libraries >Enable inter-process communication among processes of an application running across multiple nodes in the cluster (or on a symmetric multi-processing system) >“Masks” the underlying interconnect from the user application  Allows application programmer to use a “virtual” communication environment as reference for programming applications for clusters Messaging Passing Interface (MPI) Parallel Virtual Machine (PVM) >Included with most Linux distributions (open source)  IP (Ethernet)  GM (Myrinet) >Linda (commercial)  IP (Ethernet) >MPICH2 (free)  IP (Ethernet)  MX (Myrinet)  InfiniBand >LAM-MPI (free)  IP (Ethernet) >Scali (commercial)  IP (Ethernet)  MX (Myrinet)  InfiniBand >OpenMPI (free)  IP (Ethernet)  InfiniBand
  39. 39. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 39 Compilers & Other tools >Compilers are critical in creating an optimized binary code that takes advantage of the specific processor architectural features such that the application can exploit the full power of the system and runs most efficiently >Respective processor vendors typically have the best compilers for their processors – e.g. Intel, AMD, IBM, SGI, Sun, etc. >Compilers are important to produce the best code for HPC applications as individual node performance is a critical factor for the overall cluster performance >Open source and commercial compilers are available such as the GNU GCC compiler suite (C/C++, Fortran 77/90) (Free), and PathScale (owned by QLogic) compilers >Support libraries and debugger tools are also packaged and made available with the compilers, such as Math libraries (e.g. Intel Math Kernel Libraries, AMD Core Math Library) and debuggers such as gdb (GNU debugger) and TotalView debugger used for debugging parallel applications on clusters
  40. 40. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 40 HPC Software Stack The Intelligent Clusters supports a broad range of HPC software from industry leading suppliers. Software is available directly from IBM or the respective Solution providers. Functional Area Software Product Source Comments Cluster Systems Management xCAT2 IBM Director IBM CSM functionality now merged into xCAT2 File Systems General Parallel File System (GPFS) for Linux; GPFS for Linux on POWER IBM PolyServe Matrix Server File System HP NFS Open Source Lustre Open Source Workload Management Open PBS Open Source PBS Pro Altaire LoadLeveler IBM LSF Platform Computing MOAB Cluster Resources Commercial version of Maui scheduler Gridserver Datasynapse Maui Scheduler open source Interfaces to many schedulers Message Passing Interface Solutions Scali MPI Connect™ Scali Compilers PGI Fortran 77/90; C/C++ STM Portland Group 32/64-bit support Intel Fortran/C/C++ Intel NAG Fortran/C/C++ NAG 32/64-bit Absoft® Compilers Absoft PathScale™ Compilers PathScale AMD Opteron GCC open source
  41. 41. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 41 Debugger/Tracer TotalView Etnus CodeAnalyst AMD Timer/event profiling pipeline simulations Fx2 Debugger™ Absoft Distributed Debugging Tool (DDT) Allinea Math Libraries ACML (AMD Core Math Libraries) AMD/NAG BLAS, FFT, LAPACK Intel Integrated Performance Primitives Intel Intel Math Kernel Library Intel Intel Cluster Math Kernel Library Intel IMSL™, PV-WAVE® Visual Numerics Message Passing Libraries MPICH Open Source TCP/IP networks MPIC-GM Myricom Myrinet networks SCA TCP Linda™ SCA WMPI II™ Critical Software Parallelization Tools TCP Linda® SCA Interconnect ManagementScali MPI Connect Scali Performance Tuning Intel VTune™ Performance Analyzer Intel Optimization and Profiling Tool (OPT) Allinea High Performance Computing ToolkitIBM http://www.research.ibm.com/actc Threading Tool Intel Thread Checker Intel Trace Tool Intel Trace Analyzer and Collector Intel Functional Area Software Product Source Comments HPC Software Stack Cont.
  42. 42. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 42 Cluster Benchmarking Benchmarking – technique for running some well-known reference applications on clusters in order to exercise various system components and measuring the performance characteristics of the cluster (e.g. network bandwidth, latency, FLOPs, etc.) >STREAM (memory access latency and bandwidth)  http://www.cs.virginia.edu/stream/ref.html >Linpack - the TOP500 benchmark  Solves a dense system of linear equations  You are allowed to tune the problem size and benchmark to optimize for your system  http://www.netlib.org/benchmark/hpl/index.html >HPC Challenge  A set of HPC benchmarks to test various subsystems of a cluster system  http://icl.cs.utk.edu/hpcc/ >SPEC  A set of commercial benchmarks to measure performance of various subsystems of the servers  http://www.spec.org/ >NAS 2.3 Parallel Benchmarks >http://www.nas.nasa.gov/Resources/Software/npb.html >Intel MPI Benchmarks (previously Pallas benchmarks)  http://software.intel.com/en-us/articles/intel-mpi-benchmarks/ >Ping-Pong (Common MPI benchmark to measure point-to-point latency and bandwidth >Customer's own code  Provides a good representation of the system performance specific to the application code
  43. 43. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 43 Summary > A Cluster system is created out of commodity server hardware, high-speed networking, storage and software technologies > High-performance computing (HPC) takes advantage of cluster systems to solve complex problems in various industries > IBM Intelligent Clusters provides a one-stop-shop for creating and deploying HPC solutions using IBM servers and third party Networking, Storage and Software > InfiniBand, Myrinet (MX and Myri-10G), and 10Gigabit Ethernet technologies are more commonly used as the high-speed interconnect solution for Clusters > IBM GPFS parallel file system provides a highly-scalable, and robust parallel file system and storage virtualization solution for Clusters and other general-purpose computing systems > xCAT is an open-source, scalable cluster deployment and Cloud hardware management solution > Cluster benchmarking enables performance analysis, debugging and tuning capabilities for extracting optimal performance from Clusters by isolating and fixing critical bottlenecks > Message-passing middleware enables developing HPC applications for Clusters > Several commercial software tools are available for Cluster computing
  44. 44. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 44 Glossary of Terms >Commodity Cluster >InfiniBand >Message Passing Interface (MPI) >Extreme Cluster (Cloud) Administration Toolkit (xCAT) >Network-attached storage (NAS) >Cluster VLAN >Message-Passing Libraries >Management Node >High Performance Computing (HPC) >Roll Your Own (RYO) >BP Integrated >Distributed Network Topology >Intelligent Clusters >General Parallel File System (GPFS) >Direct-attached storage (DAS). >iDataPlex >Inter-node communication >Compute Network >Centralized Network Topology >IBM Racked and Stacked >Leaf Switch >Core/aggregate Switch >Quadruple Data Rate >Storage Area Network (SAN) >Parallel Virtual Machine (PVM) >Benchmarking
  45. 45. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 45 Additional Resources >IBM STG SMART Zone for more education:  Internal: http://lt.be.ibm.com  BP: http://lt2.portsmouth.uk.ibm.com/ >IBM System x  http://www-03.ibm.com/systems/x/ >IBM ServerProven  http://www-03.ibm.com/servers/eserver/serverproven/compat/us/ >IBM System x Support  http://www-947.ibm.com/support/entry/portal/ >IBM System Intelligent Clusters  http://www-03.ibm.com/systems/x/hardware/cluster/index.html
  46. 46. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 46 Trademarks •The following are trademarks of the International Business Machines Corporation in the United States, other countries, or both. >Not all common law marks used by IBM are listed on this page. Failure of a mark to appear does not mean that IBM does not use the mark nor does it mean that the product is not actively marketed or is not significant within its relevant market. >Those trademarks followed by ® are registered trademarks of IBM in the United States; all others are trademarks or common law marks of IBM in the United States. For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml: •The following are trademarks or registered trademarks of other companies. >Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. >Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefore. >Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. >Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. >Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. >UNIX is a registered trademark of The Open Group in the United States and other countries. >Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. >ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. >IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce. •All other products may be trademarks or registered trademarks of their respective companies >Notes: >Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. >IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. >All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. >This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. >All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. >Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products

×