High Performance Computing: State of the Industry


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

High Performance Computing: State of the Industry

  1. 1. High Performance ComputingState of the IndustryAnil VasudevaPrincipal Analyst & Presidentanil@imexresearch.com IMEX RESEARCH.COM408-268-0800 ©2000-2005 IMEX Research IMEX
  2. 2. • HPC - Markets Drivers & Industry Dynamics • State of HPC Technologies Clustering, Interconnects, Storage, Blade Servers • Industry Issues/Technology Hurdles Standards, Management, Fabrics, Thermals, IP Acceleration • Futures Trends - Computing & Grids • Holy Grails Volume Driven Economics, IP Evolution, TF/Desktop HPC in our Daily Lives (Academia > Hollywood > Wall Street)©2000-2005 IMEX Research IMEX
  3. 3. Server Architectures: Competitive Technologies Price/Performance Performance: SMP vs. Clustering Serial vs.Parallel Computing Multinode Clusters Performance - tpc SMP w/8 way MPP w/4 way SerialPrice 16 32 MF 8 N-way SMP 6 4 2 Clustering i Performance Number of Processors Scalable Perf. vs. Availability Scalable Performance MPP Clustering SMP FT UniP ©2000-2005 IMEX Research Availability IMEX
  4. 4. Market Segments by Applications OLTP Transaction 10 K Processing eCommerce Data DSS (RAID - 0, 3) (RAID - 1, 5, 6) 1K Warehousing (Latency) Visual DBIOPs 100 Scientific Computing HPC HPC Imaging TPC 10 Audio Streaming Streaming HPC Video 1 1 5 10 50 100 500 MB/sec *IOs per sesond for a required response time ( ms) ©2000-2005 IMEX Research IMEX
  5. 5. HPC – 2 decades of rapid progress 1991 1998 2005 2010System Cray Y-MP C916 Sun HPC10000 Shuttle@newegg Blade 24x 333 MHz, Ultra 4x 2.6 GHz, x64Architecture 16x Vector, 4 GB, Sparc II, 24 GB,S- Bit, 4 4x 10GHz, x64 bit, Bus Bus GB, GbE 16GB, 10GbE Win Server 2003OS UNICOS Solaris 2.5.1 SP2 LongHorn/LinuxGflops ~10 ~10 ~10 ~100Price $40 million $1 Million <$4000 ~$1000Price Reduction 1 1/40 1/10,000 1/400,000 Government Small>LargeTarget Audience National Labs Large Enterprises Businesses Every Professional Bioinformatics,Applications Classified, Climate, Manufacturing, Materials Architects,Wall Physics Energy,Finance Sciences Street, Hollywood ©2000-2005 IMEX Research IMEX
  6. 6. The rapid rise of Clusters in HPC Rise of Cluster Computing 350 # Clustered Computers in Top 500.org 300 250 200 150 100 50 0 1999e 2000 2001 2002 2003 2004e©2000-2005 IMEX Research IMEX
  7. 7. HPC Market Drivers• Availability of industry standard HW making HPC mainstream • HSV Servers @$200, Storage @25c/GB, Add-On GbE NICS at $10, GbE Switches at $2-5/port • Migration to Embedded ASICs on Blades• Rapid procurement, installation & System Integration simplifying acquisition• Cluster ready apps accelerating market growth • Engineering (LSTC, Ansys..) • Bioinformatics (Blast, Gaussian..) • Finance (Matlab, Excel…) • Oil & Gas (Eclipse, Promagic..) • Government (Uexplore, NOAA Hysplit..)©2000-2005 IMEX Research IMEX
  8. 8. HPC Servers Rev - 15% of Total Server Market WW HPC Rev ($M) $9,000 $8,000 $7,000 $6,000 Departm ental Rev $M $5,000 $4,000 $3,000 Divisional $2,000 Enterprise $1,000 Capability $- 2003 2004 2005 2006 2007 2008©2000-2005 IMEX Research IMEX
  9. 9. HPC Market Leaders 2004 Market Leaders©2000-2005 IMEX Research IMEX
  10. 10. Rise of Industry Standard Architecture in HPC X86 Architecture (Intel & AMD) using Linux and Microsoft based clusters dominate in High Perrformance Computing Architecture.TeraFlop HPC UsingOff-the-Shelf ModulesIntel servers with PCI Express, InfinibandHBAs & Switches, SATA/SAS HDD orAMD Opteron Servers, GbE HBA/Switches,SATA ©2000-2005 IMEX Research IMEX
  11. 11. Factors Affecting Performance • Applications Configurations affect Linpack significantly (3 metrics have the most affect) • Problem Size • Size of Blocks • Topology • Tightly Coupled MPI Applications • Very sensitive to network performance characteristics - internodal communications delay - OS Network Stack is a significant factor • Very sensitive to mismatched node performance - Random OS activities can add msec delays to usec type communication line delays©2000-2005 IMEX Research IMEX
  12. 12. HPC Architecture & ChallengesArchitecture Goals & CriteriaGoals• Maximize compute time HPC Communications Stack vs. messaging time• High constant cross- Entire Stack sectional throughput Needs Attention• Reduce or hide latency Data ApplicationMetrics Upper Layers• Bandwidth TCP/IP (or Offload I/F)• Latency OS• System Interface CPU• SW Stack Memory Chipset Challenge is to move data from I/O interconnect Network Interface application to the network and back (w or w/o TOE) with minimum latency Data ©2000-2005 IMEX Research IMEX
  13. 13. Application Application Application Application Application Memory Memory Control Control Host Host Host Socket RDMA CPU RDMA Adapter TCP TCP CPU CPU IP IP Data Data Kernel KernelTraditional Memory Memory Adapter Ethernet Ethernet RDMA Adapter RDMA Adapter Network ©2000-2005 IMEX Research IMEX
  14. 14. Interconnect Price/Performance Hierarchy High MB/s, Low Latency, Price/Port Quadrics Myrinet InfiniBand 10 GbE w/ RDMA, TOE 10 GbE w/ TOE 1 GbE Market Size Microsoft Chimney Software 2004-07©2000-2005 IMEX Research IMEX
  15. 15. State of Interconnect Fabrics EHERNET - LAN/SAN/MAN/WAN FE GbE ATM IBM/SPLatency/hop MPP SMP/NUMA Servernet Dolphin T3E Myrinet SGI Mellanox IB GbE w/TOE Quadrics FJ/Synfinity GbE w/RDMA 10GbE w/RDMA Bandwidth/link ©2000-2005 IMEX Research IMEX
  16. 16. Interconnects: Industry Progress©2000-2005 IMEX Research IMEX
  17. 17. InfiniBand Clustering• InfiniBand and Industry Standard Servers – Displacing Mainframe Computers in the Data Center with Three Standard Hardware Components – Powerful Low Cost Combo Out Performs SMP 12-Nodes (24-CPUs) Up to 72 GFlops of CPU Power; 24+ GB Memory Industry Standard Server InfiniBand Switch => InfiniBand HCA Enterprise Availability:©2000-2005 IMEX Research With Fabric Failover IMEX
  18. 18. InfiniBand rears up its head• Low Cost IB 10G Silicon & Adapters – Very Low Cost 10G Host Adapters • Host Channel Adapter ~ $69 in volume • Switch Si at less than $30 a port in volume• Compelling price/performance• InfiniBand vs. 10GE Advantages in the Short Term – 15X – 20X price advantage vs 10GE – 3X – 10X latency advantage – 10X – 20X CPU utilization advantage – 3X-5X throughput advantage• Adoption by most majors• Products – InfiniBand to IP* – InfiniBand to FC – Advanced Management ©2000-2005 IMEX Research IMEX
  19. 19. Ethernet - TCP + iSCSI Offload Engines Typical 1GbE NIC File System File System Host File System SW Volume Manager Volume Manager Volume Manager SCSI Device Driver SCSI Device Driver SCSI Device Driver iSCSI iSCSI iSCSI Adapter TCP TCP TCP IP IP IP Ethernet Ethernet Ethernet Host CPU HBA Firmware HBA Hardware©2000-2005 IMEX Research IMEX
  20. 20. Interconnects: ApplicabilityGrids 100 Mbit 1 Gig 10 Gig 40 GigData Center Ethernet Ethernet Ethernet EthernetLANSAN Fibre ChannelStorage RDMA NICs with iSCSINASStorage RDMA NICsClusters Myrinet InfiniBandBlade QuadricsChassisNetworkSMP PCI Express & Advanced SwitchingBackplanes SunFire Link, IBM Federation <1 Gigabit 1 Gigabit 10 Gigabit >10 Gigabit ©2000-2005 IMEX Research IMEX
  21. 21. InfiniBand Adopters• Major providers of InfiniBand Solutions• Enterprise Class Products – InfiniBand to IP – InfiniBand to FC – Advanced Mgmt• Platform support includes – Intel (IA32 + 64EMT, Itanium), AMD Opteron, Power PC, Sparc• Operating Systems – Linux, Windows, AIX, HPUX, Solaris, MAC OS X, VxWorks ©2000-2005 IMEX Research IMEX
  22. 22. HPC Interconnect – LeadersMyrinet for the highlylatency sensitiveapplicationsand GbE for themajority applicationsDominate theInterconnect for HPC.Infiniband is rearingto dominate at theMidrange latencyapplications ©2000-2005 IMEX Research IMEX
  23. 23. HPC Tools Availability©2000-2005 IMEX Research IMEX
  24. 24. Data Center Environmentals Prep Temp F Speed fpm Sources: Fluent Airpak, Ansys CFD,©2000-2005 IMEX Research IMEX
  25. 25. HPC embraces a broad spectrum of markets .. High Performance Computing World©2000-2005 IMEX Research IMEX
  26. 26. 3 Tier Computing Infrastructure Application End-Users  Uniform and ubiquitous physical Internet Connectivity – “Wire Once”  Any user to any server via Internet Tier –1  Any server to any server via the LAN Network Services:  Any sever to any storage via the SAN - Web Servers,  Dynamic Logical Binding of… - Firewalls, - Load balancers  server network identification LAN  server OS version  server application assignment Tier-2  application data volumesApplication Services: - Application Servers  Resulting in  Application Mobility across servers  Data mobility across storage Tier-3 SANData Base Services - Database Servers - Storage Storage ©2000-2005 IMEX Research IMEX
  27. 27. E2E Internet & 3 Tier Computing HA, Secure Data Center FC SAN IP SAN NAS NAS IP Storage Network Web App Internet Web DB Edge Core Web App Optical Web DB Web App Web Web DB App Web Edge Applications Data Center Directory Security Policy Management Software OS Platform©2000-2005 IMEX Research IMEX
  28. 28. Fabric based Clusters Architecture Ethernet, Network Fabric Wi-Fi. System Fabric Quadrics, Management Myrinet, SNMP, SCI Fabric IETF/ InfiniBand, CIM, Ethernet / IP, SMI-S, Ethernet IP SMASH. w/TOE, Ethernet IP w/TOE and RDMA. SCSI, Fiber Channel, ISCSI.©2000-2005 IMEX Research IMEX
  29. 29. Fabric based Grid ApplicationsStorage Systems©2000-2005 IMEX Research IMEX
  30. 30. Scale Up (Consolidated SMP Servers) Server Platforms - Migrations Large SMP Clusters Parallel Sysplex High Availability/ - Scalable Nodes, Hi Perform.Clusters - Dynamic LPAR - Linux/AIX/Solaris Clusters - SW Provisioning/VMWare - HPC Clusters IBM: i890/p690/z990 DELL: PowerEdge 6650 SUN: SunFire 4800, 12K ORACLE: 9iRAC/PE6650 Blade Servers High Density Rack Mount - Highly Integrated Dense Form Factor - Rapid Deployment, Flexible Architect. RLX: ServerBlade 300i, 800i, HP: 40b, p FUJITSU: BX300, DELL:1655 IBM: HS20, Sun: Scale Out (Clustered Blade Servers) ©2000-2005 IMEX Research IMEX
  31. 31. Blades Servers - Infrastructure Management Modules Remote Mgmt+KVM over IP Midplane Networking: Gbit Ethernet Switches W/Connectors Storage: IP NAS or FC SAN To Blades & Back Switch Cooling N+1Fans/Cooling Modules Processor Blades Modules (6-24 typically) Power: N+1Power Supplies Blade Control Panel USB Ports, CD/Floppy I/F ~ GbE Switch supports Trunking/Port Aggregation Flow Control Gbit QoS Packet Prioritization Memory Ethernet SNMP/RMON Systems I/F Micro- DDR w IGMP/BOOTP/TFTP Monitor Processors ECC …… Module © 2003 Source: IMEX Research©2000-2005 IMEX Research IMEX
  32. 32. Management – Blade Servers Ultra-dense HW • Density 20 % • Power Consumption 20% • Cable Management Management • Platforms integration:  Hardware  Software 80%  Network  Management • Scalability 80 % • Security • Manageability • Lowest TCO©2000-2005 IMEX Research IMEX
  33. 33. Blade Servers Market Opportunity $12 WW Blade Servers $10 Market Opportunity (cum 2004-2008) Factory Rev $B $8 Revenues ~ $24 Billion Units ~ 9 Million $6 $4 $2 $- 2003 2004 2005 2006 2007 2008 Data: IMEX Research 2005©2000-2005 IMEX Research IMEX
  34. 34. Blade Servers: Vendor Positioning Index (As of 31 Mar 2004 - See IMEX Blade Servers Industry Report 2005 for new data) Strategy (Potential) IBM Fujitsu Siemens Hitachi HP Linux Intel © 2003-05 IMEX Research NEC Networx Dell Nexcom Tatung Sun Appro Egenera Penguin Rackable Sys HPC Syst Verari©2000-2005 IMEX Research Delivery (Execution) IMEX
  35. 35. iSCSI on roadmap of storage vendors SAN iSCSI NAS (FibreCOST (IP) (IP) Channel) •iSCSI delivers low-cost Networked Storage –Higher availability,Increased usage High End –Centralized management,Storage functionality FC & iSCSI Mid-Tier FC & iSCSI Mid-Tier NAS & iSCSI Entry FC & iSCSI SERVICE LEVEL©2000-2005 IMEX Research IMEX
  36. 36. System Cost vs. High Availability z/390 $10 M Sysplex Avg. S/390 S/390 Syst z/390 MVS MVS IBM IBM Price Propl Propl $1 M Prop. Sun-Solaris Sun-Solaris HP-UX HP-UX IBM IBM IBM-AIX IBM-AIX $100 K Clustered NCR-SVR4 NCR-SVR4 Bull Bull Dell Dell UNIX AS400 Fujitsu Fujitsu HP HP IBM IBM NEC NEC Siemens Siemens Stratus Stratus UNIX Cluster Win Win $10K Linux Clustered Linux 0.1 1 10 100 Downtime Hrs./Syst/Yr 99.999% 99.99% 99.9% 99.0% System Availability©2000-2005 IMEX Research IMEX
  37. 37. The iSCSI SAN ClientWorkstations Client requests data from App Server LAN Ethernet Header IP TCP DATA CRC Ethernet SwitchesApplication & DB Servers (iSCSI Initiator) iSCSI-equipped App Server (“initiator”) requests data from an iSCSI storage (“target”) SAN Ethernet CRC IP TCP iSCSI SCSI DATA GigE Header SwitchesStorage Arrays (iSCSI Target) To Other LANs, Servers or IP SANs ©2000-2005 IMEX Research IMEX
  38. 38. Tiered Storage by Price/Performance Desktop PC Servers Large Enterprise Entry NAS Midline Storage SAN & NAS Workstations Mainstream NAS/SAN PC SAS SAS HBA/RAID FC RAID Chipset HBA DriveInterface SATA SAS SASDriveTypes 5400 RPM 10K RPM 7200 RPM 15K RPM SATA Drive SAS Drive FC Drive Source:Maxtor ©2000-2005 IMEX Research IMEX
  39. 39. Cost Driven Data Storage Markets 100% 90% SATA 80% 70% 60% SAS 50% 40% 30% FC 20% 10% P_SCSI 0% 2003 2004 2005 2006 2007 2008©2000-2005 IMEX Research IMEX
  40. 40. Grid Computing • Grid is a Catch-all Marketing Term • Means different things to different constituencies • Desktop Cycle-Stealing • Manahed HPC Clusters • Virtualization of Data Center Resources • Outsourcing to “Utility Data Centers” • Multiple, opposing requirements • Compute-intensive applications want to ship data tp idle processor resources • Data-Intensive applications want to ship computations to appropriate large data repositories • An evolving entity with many pulls & pushes • Economics, Feasibilities, management of hetero vs homogeneous HW©2000-2005 IMEX Research IMEX
  41. 41. Grid Challenges: Hype vs. Reality• Cost of Local Grids (LAGs) – ala Blades cheaper than Wide Area Grids (WAGs) • Computing Cost $1000 > 1 cpu day (10 Terops) =$1 • 10 TB network transfer Cost = $1 • Internet BW cost $100/mbps/mo > 1GB network xfr cost ~$1 • Result: Local HPC Cluster is 10,000 cheaper than WAN Communication• MPI Style Apps work well in LAN Clusters but uneconomical in WAN • Data analysis best done by moving programs to data not data to programs • Small data, high compute applications woek well across internet • Internet is not the CPU back plane©2000-2005 IMEX Research IMEX
  42. 42. Grids – Homogeneous vs Heterogeneous©2000-2005 IMEX Research IMEX
  43. 43. Grid Services Architecture & Standards©2000-2005 IMEX Research IMEX
  44. 44. Grid Computing – Across 3 TiersGrid Control Storage Databases Application Servers©2000-2005 IMEX Research IMEX
  45. 45. Utility Computing/Web Services Autonomics • Self Healing • Self Optimizing • Universal Identity • Single System Image • …. Internet Internet Network Switching Pool • HP - Utility Data Center Utility Controller Virtualization Load Balancer Pool • Sun - N1/SunONE/Orion Firewall Pool Server Virtualization • Microsoft - .Net Server Pool NAS Pool • IBM - OnDemand/Autonomics Storage Switching Pool Virtualization Storage Pool©2000-2005 IMEX Research IMEX
  46. 46. Future Computing: Processes Driven Initiator Initiator Initiator Application Application Application Old model New modelHardware is the initiator Processes are the initiators HBA ID Initiator Allows applications to move storage wherever they are - storage no longer cares where they are Target ©2000-2005 IMEX Research IMEX
  47. 47. Application-Driven Storage Virtualization Application Application ApplicationMoving from storage ...to storage associated associated with with application servers... processes Old Model New ModelServer Hardware is the Allows applications to initiator. Storage move storage from the Drives are the Target virtualized pool, wherever they may be ©2000-2005 IMEX Research IMEX
  48. 48. Future: Embedded10Gb,TOE, iSCSI CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU TOE MC MC MC MC MC MC MC MC DMA DMAPCI Exp PCI Exp PCI Exp PCI Exp PCI Exp I/O Uses I/O Uses I/O 1GbE Bridge Bridge Syst Bridge Syst 10 GbE Mmry Mmry 1GbE TOE GbE TOE 1GbE GbE 1GbE PCI-X Ext TOE Mmry PCI PCI-X TOE PCI PCI-X with 10GbE 2GbE Ext.Mmry2x1Gb TOE NIC TOE chipset 10 Gb TOE NIC Embedded 2003 2004 2005 2006 onwards ©2000-2005 IMEX Research IMEX
  49. 49. Future: IP Storage Management on a chip Host Services Integration Storage ProvisioningFile system monitoring Storage provisioning Win, LINUX, Solaris Layer IP SAN Management Management Management of MultiPath IO Supp Security IP SAN Console iSCSI HBAs and Failover (iSNS, CHAP, SRP) Management Layer Virtualization Mirroring Snapshot Fail-Over iSCSI Target Management Appliance LVM, Error Handling, SCSI Daemon, API Interoperability Service Layer HW Acceleration: TOE, iSCSI Offload, IPsec©2000-2005 IMEX Research IMEX
  50. 50. Future Intelligent IP Infrastructure Present Data Center Infrastructue Future Infrastructue Clients LAN Ntwk Mgmt WSApplication & Intelligent DB Servers Uniform IP-based Infrastructure Mgmt SAN Storage WS Mgmt WS Storage Sub Systems ©2000-2005 IMEX Research IMEX
  51. 51. Future - IP Based Infrastructure Servers Storage IP Networks Infrastructure Telecom Virtualization Provisioning Automation©2000-2005 IMEX Research IMEX
  52. 52. HPC – From Academia to Wall St to Hollywood High Performance Entertainment High Performance Commercial Bioinformatics Decision-Support Entertainment Commercial BioInformatics Decision Audio/Video OnDemand Computing Computing Visualization Visualization Support Syst Audio/Video On Demand SystemsData: IMEX Research & Panasas 100+ Teraflops Rendering (Texture & Polygons) Data rate & capacity Throughput = 100 GB/s Throughput = 1.2 GB/s Throughput : DSL/Cable ©2000-2005 IMEX Research IMEX ©2004--05 IMEX Research.com
  53. 53. HPC: From Homeland Security to Bioinformatics©2000-2005 IMEX Research IMEX
  54. 54. HPC From Academia to Hollywood©2000-2005 IMEX Research IMEX
  55. 55. High Performance ComputingState of the IndustryAnil VasudevaPrincipal Analyst & Presidentanil@imexresearch.com IMEX RESEARCH.COM408-268-0800 ©2000-2005 IMEX Research IMEX
  56. 56. State of Interconnect Fabrics Latency/hop 1 ms EHERNET - LAN/SAN/MAN/WAN FE GbE IB - CLUSTER ATM IBM/SP MPP 100 us Servernet Dolphin SMP/NUMA T3E Myrinet SGI GbE Quadrics Mellanox w/TOE 10 us FJ/Synfinity Synfinity-1 GbE 10GbE w/RDMA Scali w/RDMA 1us 100 Mb/sec 1Gb/sec 10 Gb/sec Bandwidth/link©2000-2005 IMEX Research IMEX