HP - HPC-29mai2012

1,253 views
1,153 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,253
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • HPC servers follow Intel & AMD x86 processor roadmaps .HPC servers will be thin nodes with 2P or fat nodes with a large memory for SMP jobs and 4P. For HP AMD servers, we currently have Magny-cours, and Lisbon on one SL (SL335)The current HP Intel servers are based on Westmere-EP for 2P. There is also a 2P server based on Nehalem-EX , the BL620G7, which offers up to 512GB of memory (2x16x16GB mem)The future major processor updates are Intel Sandy bridge and AMD InterLagos.IL is planned for Nov 2011.SB is planned for Feb 2012.
  • these functions will e common across ALL ProLiant Gen8 platforms.There are FOUR key new functional areas (in addition to remote console/remote power/virtual media) that our new iLO Management Engine covers:PROVISIONING:Our new Intelligent Provisioning takes all of our strengths of products like SmartStart, PSPs and HP SUM, enhances them with the latest ease of use features and places them where you can use them immediately: on the system boardMONITORING:A long-time desire of many of our customers is now offered as an industry-first by HP: agentless management. Base hardware health monitoring and alerting functions now run straight on the iLO hardware, independent of the operating system and the x86 processor. DIAGNOSTICS:While HP server failures are few and far between, some failures can be extremely hard to reproduce, may escape regular diagnostic runs and therefore may sometimes take too long to fix. With our new Active Health System we offer an always running diagnostics system, recording every configuration change and every alert, so as to facilitate faster root-cause analysis and problem resolution.SUPPORT:With iLO Management Engine we also offer a revolutionary built-in remote support system that provides phone-home capabilities which can either interface directly with the HP backend (which is ideal for smaller customers, or for remote sites without a permanent connection to the main site), or can use an HP Insight Remote Support host server as an aggregator.
  • Instead of embedding the adapter on the server system board like we do on the current servers and our customers have to accept the default adapter on the server, FlexLOM provides choice. Just like a box of chocolates, you can choose your flavor. The ability to upgrade when needed thus offering the model of buy as you need plan e.g. start with 1Gb and replace with 10Gb at a later stage.Because of that flexibility, you may not need to add an Ethernet Mezzanine card thus saving you $. An additional mezzanine card may cost an additional $300 to $800 ILP.
  • With over 5 million SAS Smart Array Controllers shipped, HP Smart Array controllers have delivered several innovations for protecting customer data including the first RAID, First RAID 6/60, First Cache and first Flashback write cache.HP Gen8 Smart Arrays bring significant enhancement, starting with a 2X increase in performance (final results awaited)The Gen8 servers support up to 2 GB Cache provide faster access to data, and improves rebuild times.With twice the number of drives supported as previous generations, you can attach up to 227 drives combined internal and external to the ProLiant Gen8 server compared to 108 drives on ProLiant G7. With the explosive data growth that most customers are seeing in their environments, this feature will help tremendously.Flash back write cache is now standard on most Smart Arrays Controllers. FBWC uses a capacitor and flash memory to hold ‘write cached’ data instead of a battery and the DIMM. The main advantage of this is, when the power goes out, the write cache is moved to Flash and held pretty much indefinitely. No worries though, when the server is powered back on, and the data in flash is written to the drive, the data is erased from the flash module.Improvements in Parity initialization , up to 95% means that customers can setup and be productive quickly ACU management tools are now embedded. Much easier to access . Migrating data from one generation to next seamless lets customers be up and running more quickly with newer servers. Firmware and driver updates for drives and controllers are on the same cadence.SSD Wear Gauge allows customers to track the number of write and erase. How tired is your SSD and is in under warranty? Security- Secure erase utility completely erases your data before removing drives.Manage drives across the entire datacenter with scripting. Faster time to resolution on drive issues.Space Constraints/Real estate economics (Density)- Larger capacity drives and greater density of drives on servers. Managing the growing storage needs of customers by adding large capacity drives and support for 2x the number of drives on previous generations.Investment protection: As technologies evolve and change is the IT future proof with a forklift upgrade, Headroom for growth- 6G Smart Array controllers are backwards compatible with the 3G SAS drives and disk enclosures.
  • HP SmartMemory is unique technology being introduced for ProLiant Gen8 servers that unlocks certain features only available on HP Qualified memory. A unique signature written to the memory SPD (serial presence detect) verifies to our new servers that HP SmartMemory, that has passed HP’s rigorous qualification and test process, is installed.It is ideal for customers who want assurance of buying genuine HP qualified memory, performance tuned for HP ProLiant and BladeSystem servers and enhanced support through HP Active Health.As a result of our engineering work and the ability to assure customers that have genuine HP SmartMemory, several unique specifications will be available on HP ProLiant Gen8 servers. First, HP will be able to operate low voltage (1.35V) HP SmartMemory at the same 3 DIMMs per channel (3DPC) at 1333 MHz that 3rd party memory will have to run at 1.5V to achieve.Second, HP will be able to operate UDIMM memory at a 25% faster bus rate that third party memory at the same voltage.Substantiation:15 - 20% less power than 3rd party memory at 3DPC – HP Internal Lab testing/calculations 25% greater throughput at either 1DPC or 2DPC versus 3rd party - HP Internal Lab testing/calculations
  • Beginning March 6th we’ll begin to deliver our new line of ProLiant Gen8 servers, including three product lines that are of particular use in HPC.The ProLiant DL family of racked servers provides maximum configuration flexibility, with balanced efficiency, performance and management. From 1u “pizza box” servers that have powered many major HPC clusters, to memory or storage rich servers that service specific HPC workloads.HP BladeSystem, the ProLiant BL family, has been the most successful system in HPC since shortly after the launch of c-Class in 2006, with more HP BladeSystem clusters on the TOP500 list of the largest supercomputers in the world (www.top500.org) than any other system architecture. The reason? HP BladeSystem is the essence of HPC convergence, with everything you need for an efficient HPC cluster within the enclosure – servers, storage, networking, management, etc. But as our HPC solutions grew into hyperscale, we found the demands of our largest sites required a new way of thinking about servers for HPC. From that clean sheet design came the HP ProLiant SL family, designed specifically for the most challenging HPC environments, driving new levels of performance, scalability, efficiency and agility.
  • Beginning March 6th we’ll begin to deliver our new line of ProLiant Gen8 servers, including three product lines that are of particular use in HPC.The ProLiant DL family of racked servers provides maximum configuration flexibility, with balanced efficiency, performance and management. From 1u “pizza box” servers that have powered many major HPC clusters, to memory or storage rich servers that service specific HPC workloads.HP BladeSystem, the ProLiant BL family, has been the most successful system in HPC since shortly after the launch of c-Class in 2006, with more HP BladeSystem clusters on the TOP500 list of the largest supercomputers in the world (www.top500.org) than any other system architecture. The reason? HP BladeSystem is the essence of HPC convergence, with everything you need for an efficient HPC cluster within the enclosure – servers, storage, networking, management, etc. But as our HPC solutions grew into hyperscale, we found the demands of our largest sites required a new way of thinking about servers for HPC. From that clean sheet design came the HP ProLiant SL family, designed specifically for the most challenging HPC environments, driving new levels of performance, scalability, efficiency and agility.
  • Improvements in Gen8:Processor - Improved CPU performance, power, and capabilitiesMemory - Increased memory capacity, 1600MHz performance, and lower powerI/O –Enhanced I/O mezzanine bandwidth with x16 PCI-e Gen3 and reduced latencyStorage – Improved RAID performance with 512MB Flash cache and new hard drive carrierManagement - Next generation HP iLO ME and cloud-enabled Insight Management
  • Incremental Value over BL280c G6Hot-plug Drive supportIncreased Core Support (8 vs. 6)Enhanced Manageability Suite (iLO4, BlackBox, OA3.50)Improved Warranty (3/3/3 vs. 3/0/0)Flexible Storage Controller Options via daughter cardFlexible Networking Options via daughter cardImproved Memory Footprint (32GB support vs 16GB)
  • Beginning March 6th we’ll begin to deliver our new line of ProLiant Gen8 servers, including three product lines that are of particular use in HPC.The ProLiant DL family of racked servers provides maximum configuration flexibility, with balanced efficiency, performance and management. From 1u “pizza box” servers that have powered many major HPC clusters, to memory or storage rich servers that service specific HPC workloads.HP BladeSystem, the ProLiant BL family, has been the most successful system in HPC since shortly after the launch of c-Class in 2006, with more HP BladeSystem clusters on the TOP500 list of the largest supercomputers in the world (www.top500.org) than any other system architecture. The reason? HP BladeSystem is the essence of HPC convergence, with everything you need for an efficient HPC cluster within the enclosure – servers, storage, networking, management, etc. But as our HPC solutions grew into hyperscale, we found the demands of our largest sites required a new way of thinking about servers for HPC. From that clean sheet design came the HP ProLiant SL family, designed specifically for the most challenging HPC environments, driving new levels of performance, scalability, efficiency and agility.
  • Accelerators are used to increase the performance of HPC systems by accelerating computational or storage hot spots in HPC applications, enabling SL200s systems to achieve new levels of performance/$/watt/ft2SL200s systems support integrated NVIDIA Tesla GPUs, including the latest M2090 GPUs that deliver up to 30% more performance than previous generations, and the M2070Q GPU that combines computational performance with remote graphics support. This allows the SL250s to act as graphical server, enabling workstation applications to be moved into the datacenter, where they are closer to the data needing to be visualized.SL200s systems support the PCIe IO Accelerator, a solid state storage device that provides a high speed cache for storage operations, increasing I/O performance for data centric HPC applications.The SL200s will also support the Intel Many Integrated Core (MIC) processor in the future. MIC will combine many x86 cores into a single piece of silicon, enabling highly parallel HPC applications to be accelerated, or run entirely, on MIC to significantly increase their computational performance. And being an x86 core, it will run the standard IA instruction set, simplifying porting of applications to take advantage of MIC.
  • Disks can be SAS, SATA or SSDEmbedded ctrl is B320i RAID controllerSL230s: Two LFF non-hot plug SAS, SATA bays or four SFF non-hot plug SAS, SATA, SSD bays, and two Hot Plug SFF Drives (Option). Maximum internal storage 8TB.SL250s: Four SFF SAS/SATA HDDs. Hot-Plug drives only. Maximum internal storage 6TB. Optional risers, used in place of the GPU risers, support an additional four non-hot plug SFF SAS/SATA drives, for a total maximum internal storage of 10TB.SL270s: Eight SFF SAS/SATA HDDs Hot-Plug drives only. Maximum internal storage 10GB.
  • ConnectX-3 : FDR adapter chipWhen running the InfiniBand protocol, it supports Remote Direct Memory Access (RDMA); Fibre Channel over InfiniBand (FCoIB); and Ethernet over InfiniBand (EoIB). RDMA is the key feature that lowers latencies on server-to-server links because it allows a server to bypass the entire network stack and reach right into the main memory of an adjacent server over InfiniBand links and grab some data.The ConnectX-3 chip is small enough to be implemented as a single chip LAN-on-motherboard (LOM) module. The ConnectX-3 chip will also be used in PCI adapter cards and in mezzanine cards that slide into special slots on blade servers.HP FDR InfiniBand blade switch module - Monadnock consists of two new InfiniBand switches for the C-class blade system. They will be FDR InfiniBand (56Gb/sec) capable devices.HP IB FDR/EN 10/40Gb 2P 544M Adpr - The Wachusett parts implement the next generation of Mellanox chip technology called ConnectX-3. They offer higher performance and lower power and latency than the previous products.
  • HP - HPC-29mai2012

    1. 1. HPC SOLUTIONS AT HPAndrei BalcuConsultant Tehnic HP Romania
    2. 2. HPC is built on a converged infrastructure Purpose Built Purpose Built HPC Storage HPC Servers Datacenter Power & Cooling HPC Fabrics HPC Software Infrastructure
    3. 3. SERVERS IN HPC
    4. 4. SERVERS IN HPC CPU‟s : what‟s new ?
    5. 5. HPC platforms follows Chipset and Processor Roadmap 2010 2011 2012 3 mem 4 mem Westmere-EP channels Sandy Bridge channels 6 cores, 12M L3 8 cores, 20M L3 130/95/80/60/40W, 1333 MHz DDR3 1600+ DDR3, Sockets R B2 G6/G7 Gen8 Nehalem-EX Westmere-EX 8 cores, 24 MB L3 10 cores, 30 MB L3 Up to 130W Up to 130W G7 Lisbon 2 mem channels Valencia 2 mem channels 6 cores, 6 M L3 Bulldozer Core (8 cores) 4 mem 4 mem Magny-Cours channels Interlagos channels 12 cores, 12M L3 Bulldozer Core (16 cores) G6/G7 G7 and Gen8
    6. 6. SERVERS IN HPC Next generation servers
    7. 7. Intel® Westmere-EP vs. Intel® Sandy Bridge-EP-EN Westmere-EP E5-2600 (EP) E5-2400 (EN) Feature Cores Up to 6 cores / 12 threads Up to 8 cores / 16 threads Cache Size 12 MB Up to 20 MB Max Memory Channels per Socket 3 4 3 Max Memory Speed 1333 MHz 1600 MHz New Instructions AES-NI Adds AVX QPI frequency 6.4 GT/s Up to 8.0 GT/s Inter-Socket QPI Links 1 2 1 40 Lanes/Socket Integrated 24 Lanes/Socket PCI Express • 36 Lanes PCIe2* on Chipset PCIe3 Integrated PCIe3 150 (Workstation Only) Server/Workstation Power TDP Server/Workstation: 95, 80, 70, 60, 130, 115, 95, 80, 130W, 95W, 80W, LV (Low Power) 50 (Low Power) 70, 60 (Low power)
    8. 8. HP ProLiant Gen8 Marquee FeaturesInnovation beyond industry standards ProLiant System Architecture iLO Management Engine HP Smart Storage Insight Online HP FlexNet Adapters Sea of Sensors 3D
    9. 9. iLO Management EngineCore lifecycle management functions built in for instant availability INTELLIGENT AGENTLESS ACTIVE HEALTH REMOTE PROVISIONING MANAGEMENT SYSTEM SUPPORT Ready to deploy and Base hardware Built-in phone-home Continuously running update without the health monitoring and function to ease diagnostics to need for HP discs or alerting without OS setup and minimize downtime downloads agents configuration
    10. 10. HP FlexLOM - Grow Your Environment Without ComplexityChange ready for future proofing and adaptable infrastructure Provides choice • Upgrade options of 1Gb and 10Gb Choose your fabric • Ethernet, FlexFabric, Flex-10, Infiniband Universal • Available on all BL, SL and select DL servers Flexible • Supports shared iLO port like the traditional LOM1 1 LOM is short for LAN on motherboard. The term refers to a chip or chipset capable of network connections that has been embedded directly on the motherboard of a server
    11. 11. Gen8 Smart Array InnovationsIncreased performance, data availability and storage capacity Faster access to data • Up to 2X performance improvement* • 2X Cache (up to 2 GB) Address explosive data growth • 2X Drives supported (up to 227) Minimize data loss • Long term data retention with Flash Backed Write Cache standard Reduce initial setup time • 95% reduction in parity initialization Over 5 Million SAS Smart Array from several days to 5 hours** controllers sold! Continuing the legacy of innovation with Gen8 *256KiB, Sequential write, RAID 5 with 15K SAS drives, performance will vary based on configuration ** HP R & D, Validation information TBD
    12. 12. HP SmartMemoryLower power, faster and more reliable • 15 - 20% less power than 3rd party memory at 3DPC for DDR3-1333 1.35V RDIMM and DDR3-1333 LRDIMM • 25% greater throughput at either 1DPC or 2DPC versus 3rd party memory for DDR3-1333 UDIMM • Genuine HP Qualified memory reliability assured by unique electronic signature
    13. 13. Industry’s most complete portfolio for HPC• Workload optimized, engineered for any demand ProLiant DL ProLiant BL ProLiant SL Family Family Family Versatile, rack-optimized Cloud-ready converged Purpose built for the servers with a balance of infrastructure engineered to world’s most extreme efficiency, performance maximize every hour, watt data centers and management and dollar13
    14. 14. Industry’s most complete portfolio for HPC• Workload optimized, engineered for any demand ProLiant DL ProLiant BL ProLiant SL Family Family Family Versatile, rack-optimized Cloud-ready converged Purpose built for the servers with a balance of infrastructure engineered to world’s most extreme efficiency, performance maximize every hour, watt data centers and management and dollar14
    15. 15. HP ProLiant BL400c Series Positioning ProLiant BL460c ProLiant BL465c Gen8 ProLiant BL420 Gen8 Gen8 The worlds leading The first server blade to Breakthrough server blade server blade deliver over 2,000 cores economics for essential per rack enterprise workloads Snap 1 Snap 1 Snap 2
    16. 16. HP ProLiant BL460c Gen8 Overview• As the worlds leading server blade, the ProLiant BL460c Gen8 offers the ideal balance of performance, scalability, and expandability.• This makes it ideal for: • Heterogeneous datacenters and a wide variety of mainstream businesses • HPC scale-out applications for small, medium, and enterprise data centers• Key workloads include: • Virtualization/consolidation • IT infrastructure (file & print, networking, security, systems management, etc.) • Web infrastructure (web serving, streaming media, etc.) • Collaborative (e-mail, workgroup, etc.)
    17. 17. HP ProLiant BL420c Gen8 Overview• The BL420c Gen8 delivers breakthrough server blade economics for essentialenterprise workloads. It provides the perfect balance of price, performance, and highavailability in the enterprise space.• This makes it ideal for: • Mid-market and cost-sensitive enterprise customers • Service Providers who prefer the manageability of blades • Scale-out• Key workloads include: • Web Hosting/Services in the Enterprise space • Single application on a single server • IT Infrastructure (File & Print, Networking, Security & Systems Mgmt)
    18. 18. BL420c Gen8 BL460c Gen8Processor Intel® Xeon® E5-2400 Series Intel® Xeon® E5-2600 SeriesChipset Intel® C600Memory (12) DDR3, RDIMM/UDIMM, up to 1333MHz (16) DDR3, /RDIMM/UDIMM/LRDIMM/ LVDIMMMax Memory 384GB (12 DIMMs x32GB) 512GB (16 DIMMs x32GB) 2 SFF HP HDD SAS, SATA, SSD 2 SFF HP HDD SAS, SATA, SSDInternal Storage Dynamic Smart Array B320i RAID controller Smart Array P220i controllerMax Internal 2TB SAS; 2TB SATA; 1.6TB SSDStorageNetworking (1) Dual Port networking daughter card: 1GbE, 10GbE, Flex-10, or FlexFabricI/O Slots (2) PCIe Gen3: (1) x8 Type A mezzanine; (1) x16 Type B mezzanineIntegrated HP iLO Management Engine, SIM, IRS - Optional: HP Insight Control, iLO AdvancedManagement Half-height c-Class server bladeForm Factor 16 blades per c7000 (10U) enclosure; 8 blades per c3000 (6U) enclosure
    19. 19. Industry’s most complete portfolio for HPC• Workload optimized, engineered for any demand ProLiant DL ProLiant BL ProLiant SL Family Family Family Versatile, rack-optimized Cloud-ready converged Purpose built for the servers with a balance of infrastructure engineered to world’s most extreme efficiency, performance maximize every hour, watt data centers and management and dollar21
    20. 20. Integrated accelerator solutions for the SL200s familyDriving new levels of performance/$/watt/ft2 • Next generation NVIDIA Tesla performance • Up to 30% higher performance with M2090, combined computation and visualization with M2070Q • Optional HP PCIe IO Accelerator • Integrated solid state storage device to accelerate I/O bound applications • Future: Intel® Many Integrated Core (MIC) • Accelerate highly parallel applications, using the standard IA instruction set
    21. 21. SL6500 Chassis SL230 SL250 SL270 SL140 • SL230 –Socket-R, ultra-dense server for virtualization and• Shared power & fans for reduced HPC applications (1U) component quantity and increased • SL250 –Socket-R, hybrid-compute node for GPU power efficiency computing and data base applications in HPC (2U)• Ability to mix and match SL half-width • SL270 –Socket-R, high-performance GPU nodes solution, optimized for extreme GPU density (4U)• Front cabling for increased rear air- flow and ease of serviceability • SL140 – Socket-B, cost-effective, power-efficient and ultra-• Individually serviceable nodes dense solution (1U) *Needs1200mm deep racks
    22. 22. SL140s Gen8 SL230s Gen8 SL250s Gen8 SL270s Gen8Processor E5-2400 - 4/6/8 Cores E5-2600 - 4/6/8 CoresChipset Intel® C600 12xDR3, RDIMM/UDIMM,Memory 16xDDR3, RDIMM/UDIMM up to 1600MHz-ECC up to 1333MHz –ECCMax Memory 256GB 512GB 2 LFF NHPInternal 2 LFF NHP 4 SFF HP 4 SFF NHPStorage 4 SFF NHP 2 LFF NHP 8 SFF HP Opt: 2 SFF HP Opt: 2 SFF HP 4TB 3.5” SAS; 1.2TB 2.5” 2TB 2.5” hot plug SAS; 1.2TB 2.5” 4TB 3.5” SAS; 1.2TB 2.5”Max Internal SAS; 6TB 3.5” SATA; 2TB non-hot plug SAS; 2TB 2.5” hot 4TB SAS; 4TB SATA; 960GB SAS; 6TB SATA; 480GBStorage 2.5” SATA; plug SATA; 2TB 2.5” SATA; 480GB SSD 2.5” SSD 480GB 2.5” SSD 2.5” SSD 1x Integrated NC366i Dual 1x Integrated NC366i Dual Port GbeNetworking Port Gigabit Server 1xDual Port networking daughter card: QDR IB, 10GbE Adapter 4xPCIe Gen3: 1x8 HL/LP; 3x16 9xPCIe Gen3: 1x8 HL/LP; 8x16I/O Slots 1xPCIe Gen3: 1x16 HL/LP 1xPCIe Gen3: 1x16 HL/LP HL/LP HL/LPIntegrated HP iLO Mgt Engine, SIM, IRS Opt: HP Insight Control, iLO AdvManagement 1U HW - 1U HW – 2U HW – 4U HW –Form Factor 8 trays per s6500 (4U) 8 trays per s6500 (4U) 4 trays per s6500 (4U) 2 trays per s6500 (4U)
    23. 23. HP ProLiant SL250s Gen8 2U Half Width Tray Rear GPU or NHP HDDs4 Nodes per 4U chassis8 CPUs per 4U chassis 2 GPU Tray12 GPUs per 4U chassis 2 Socket-R CPUs (Below GPU Tray)PCIe Expansion Slot 16 DIMM Slots (Below GPU Tray) 4 HP SFF Management Port – iLO4 Flex Fabric Slot 2- 1GbE Ports
    24. 24. INTERCONNECTS IN HPC
    25. 25. HPC Interconnects• Bandwidth (large data exchanges)• Latency ( microseconds )• Scalability: stay efficient even for a high number of links• Can also accommodate I/O traffic• Two HPC interconnects: • Ethernet (1 GigE, 10 GigE 40GigE) • Infiniband
    26. 26. IBTA specification
    27. 27. HP Infiniband strategy• Focus on partnership – Work with technology providers.• Focus on qualification, integration, efficient supply chain – Rigorous quality testing and control – Efficient supply chain management• IB products have one basic element : the ASIC. 2 providers : Mellanox or Qlogic.• HP integrates IB switches from 2 providers : Mellanox and Qlogic (used to be 3 providers with Voltaire)• We run Benches and tests for all HCA and components.• We qualify HCA, switches, cables on our platforms.• We verify the interopability of MLX and Qlogic.
    28. 28. HP 56Gbps FDR InfiniBand Portfolio ConnectX-3HCAs in servers HP Systems Integration In 2012: FDR Chassis aggregation switches Unified Fabric Manager FDR 36-port edge switch (UFM) Acceleration Software QSFP FDR Cables Installed-base QDR switches e.g. 4036E
    29. 29. STORAGE IN HPC
    30. 30. Mix HP Storage in HPC cluster• HP X9000 Network Storage System • Small files • High metadata operation rates • Wide access • /home typically...• Lustre file system with optimized HPC focused hardware • Extreme sequential bandwidth • “True” parallel I/O • several writers to same file • Or high single stream throughput • /scratch, /work typically…
    31. 31. HP Storage versus LustreKey Differences HP Storage (X9000, P4000, MDS600) Lustre / DDN SFA10K Many Applications, or instances of the same application “One” parallelized application Each one of many servers is running a single applications Parallelized applications are spread across multiple servers. instance, up to one instance per core or VM May use MPI to communicate Each has its own file/data set. Reading and writing to a single file Many Metadata operations (IOPS) Few Metadata Operations (IOPS) Metadata is distributed across multiple servers A single server for metadata is enough Datasets are distributed across multiple servers to balance Dataset is striped across multiple storage servers, for performance maximum read/write bandwidth Typical applications: HLS (Next generation sequencing, Typical applications: computer-aided engineering, molecular biosciences /genomics NGS), media (animation render modeling, high-energy physics farms), public sector (content depots), financial services
    32. 32. MANAGEMENT SW IN HPCCMU : Cluster Management Utility
    33. 33. Insight CMU v7.07.0 (FEBRUARY 2012)
    34. 34. HP Insight CMUHyperscale cluster lifecycle management softwareProven– 10 years+ in deployment, Top500 sites included with1000‟s of nodesBuilt for Linux, with support for multiple Linux distributions • Including Hybrid support w/Windows Provision Monitor Control • Simplified • „At a glance‟ view of • GUI and CLI discovery, firmware entire system; options audits zoom to component • Easy, friction-less • Fast and scalable • Customizable control of remote cloning • Lightweight servers
    35. 35. Worldwide CMU Deployments HP ships 2 CMU clusters per week WW UNIVERSITIES ENGINEERINGGOVERNMENT and RESEARCH LABS ENERGY38 April 2009
    36. 36. CMU main functionalities Deployment Imaging (cloning) Autoinstall (kickstart|autoyast|preseed) Diskless Scalable live monitoring Scalable non intrusive monitoring engine (+collectl) Monitoring GUI / monitoring API Day to day administration interactive cli ( + cmu_* linux commands) cmudiff, command broadcast multiple window broadcast (one window per host) single window PDSH, one command on all the hosts GUI (JAVA based for the desktop)39
    37. 37. Time View
    38. 38. CMU Backup / Cloning FeatureNeeds:Setup of cluster is painful.System management of HPC clusters is difficult due to the large number of nodes.Cloning goals: Avoid „one by one‟ system installation on compute nodes Fast Cluster installation with an optimised cloning mechanismHOW: Install one compute node Backup that compute node ( golden image ) Duplicate that golden image to all compute nodes
    39. 39. Diskless Installation• Large Scale Diskless Support – When Diskless nodes are installed, the FS of the compute nodes completely runs via NFS, while the OS is loaded in RAM. – Existing NFS-root based diskless support expanded to allow for multiple NFS servers – Up to 4k diskless compute nodes 43
    40. 40. CMU GPGPU Support • CMU provides new binary for extracting GPU metric data from GPU driver –/opt/cmu/tools/cmu_get_nvidia_gpu • New command cmu_config_nvidia to configure GPU monitoring –Configuresload, mem_util, mem_alloc, power_state, and ECC_double_bit alerts by default –Power_usage, various clock speeds, fan speeds, and temperature also configured but commented out by default 45
    41. 41. THANK YOU

    ×