ScaleMP - Introduction

851 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
851
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
47
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

ScaleMP - Introduction

  1. 1. THE HIGH-END VIRTUALIZATION COMPANY SERVER AGGREGATION – CREATING THE POWER OF ONEShort Introduction Aggregate. Scale. Simplify. Save.
  2. 2. COMPANY AND PRODUCTS 13-Sep-10 2 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  3. 3. What We Do ? N x Servers 1 VM 16 x 16 x 16 x 16 x OS 16 x OS 16 x OS 16 x OS 16 x OS 16 x OS NOS OS 1 OS x OS OS Virtualization software for aggregating multiple off-the-shelf systems into a single virtual machine, providing improved usability and higher performance 13-Sep-10 3 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  4. 4. Proven Track Record Over 150 Customers Worldwide Key Industry Partners Commercial Higher Ed./Research Higher Ed/Research Federal Government 13-Sep-10 4 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  5. 5. Virtualization Defined……abstracting the physical characteristics of a computing resource… Partitioning Aggregation Providing a virtual resource that is a Providing a virtual resource that is a subset of the physical resource concatenation of several physical resources “Utilization” “Management, Capability” Software Hardware Hardware Software Volume Array- Disk Disk Array- Volume Mgmt based Partitioning Concatenation based Mgmt Stack- Switch- Link Switch- OS- VLANs based based Aggregation based basedHypervisor / Server Server Mainframe SMP, MPP VMM Virtualization Aggregation Single system only 13-Sep-10 5 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  6. 6. Server Virtualization PARTITIONING AGGREGATION Subset of the physical resource Concatenation of physical resources Virtual Machines Virtual Machine App App App App OS OS OS OS Hypervisor or VMM Hypervisor Hypervisor Hypervisor Hypervisor or VMM or VMM or VMM or VMM 13-Sep-10 6 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  7. 7. vSMP Foundation Aggregation Platform Cluster SMP Cloud Manageability Cost Savings & Performance FlexibilityManageability Cost Savings Flexibility• Single Operating System (OS) for • Up to 5X cost savings • On-the-fly aggregated VM up to 128 nodes provisioning and tear-down• OS driven job scheduling and Performance • Scaling memory or CPU resource management • Leveraging latest Intel processors • Best x86 solution by SPEC CPU2006 UtilizationPerformance • #7th best shared-memory by • Resource fragmentation reduction• InfiniBand performance with zero STREAM (memory bandwidth) • Support any programming model management and knowhow (serial, throughput, multi- Reliability threaded, large-memory) withoutStorage • Fault detection and component machine boundary• Built-in cluster file system isolation • Redundant backplane support IntegrationInstallation • Network installation provides• Unboxing to production in less Capabilities seamless integration with any grid than 3 hours • Up to 16,384 CPUs and 64TB RAM management system 13-Sep-10 7 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  8. 8. TECHNOLOGY 13-Sep-10 8 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  9. 9. How Does it Work ?Multiple Computers with Multiple Operating Multiple Computers Systems with a Single Operating System 13-Sep-10 9 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  10. 10. How Does it Work ? Bare Metal, Distributed Multiple Computers Virtual Machine Monitor with a Single Operating System• Loaded at boot time Up to 16 servers (today), – Supported boot devices: USB, IDE, 128 servers (3Q2010) CompactFlash or Network Image (PXE)• Fabric probing and VM setup• Loading the OS and maintaining I/O and memory coherency – Software interception engine creates a uniform execution environment – Creates the relevant BIOS environment to present the OS (and the SW stack above it) a single system Network – Exposes all available CPU, Memory and I/O Boot resources to the OS – I/O resources unified into a single PCI hierarchy 13-Sep-10 10 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  11. 11. How Does it Work ? Multiple Computers Aggregated System with a Single Operating System• Systems configuration can be Up to 4TB aggregated (today), different 64TB aggregated (3Q2010) – Aggregating systems with different boards, I/O configurations, processors speed and memory configuration 14 x RAM – Only one type of CPU will be presented to the OS 1 x RAM 4 x RAM• >10 different coherency 1 x RAM mechanisms 4 x RAM• Aggregated hardware I/O 1 x RAM compatibility list include devices 1 x RAM from Intel, Broadcom, LSI, ATI, Emulex, Adaptec and others 1 x RAM 1 x RAM 13-Sep-10 11 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  12. 12. How Does it Work ? Multiple Computers Memory Coherency with a Single Operating System• Decision criteria: – Access pattern and historical knowledge on the reference (node level) – Memory block association (system level) 14 x RAM – Code scanning – Virtual address access pattern• Transfer size: Coherent Memory – 4K native cache line – 128KB transfers for pre-fetch – Instruction size: byte size for PINNED memory• Ownership: – No backstore or persistent memory location. Board- local caching. – Memory migration and replication - Shared / Exclusive / Invalid – Pre-fetch with persistence ownership (motivated by code-scanning or I/O access pattern) 13-Sep-10 12 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  13. 13. How Does it Work ? Multiple Computers Resiliency with a Single Operating System• Hardware resiliency (by the hardware vendor): – Redundant power – Redundant cooling – Memory mirroring• vSMP Foundation architecture resiliency: – Active-active backplane – Fault isolation – immediate restart without failed components – Partitioning / containers support• Remote management (by hardware vendor + ScaleMP): – Hardware chassis management – vSMP Foundation native Serial-over-LAN (SOL) support 13-Sep-10 13 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  14. 14. How Does it Work ? Optional Complete Software Stack Actual Case• vSMP Foundation Productivity Pack: Automatic full Project Start software stack installer and tuning tool• Real time statistics counters and system profiler. Designed to provide the user information about system efficiency – Efficiency information reveals the major application +3 days scalability issues that reduce system performance – No changes required for the application code; does not affect timing – Used for pinpoint scalability issues as well as for software tuning; not available on other architectures +7 days• OS-level tuning: Enabled by single kernel compile option +10 days• Tuned libraries: MPI (MPICH), OpenMP (Intel), MKL (Intel)• Application execution guidelines 13-Sep-10 14 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  15. 15. CUSTOMER USE CASES 13-Sep-10 15 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  16. 16. 3.1DEPLOYMENT SCENARIOS 13-Sep-10 16 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  17. 17. SMP (1) 13-Sep-10 17 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  18. 18. SMP (2) vSMP Foundation DC2: 13-Sep-10 18 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  19. 19. SMP (3) Ethernet Switch 13-Sep-10 19 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  20. 20. Cluster 13-Sep-10 20 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  21. 21. Cloud10 STEPS in 10 MINUTES1. User submits a job requiring 80cores and 400GB RAM2. Resource manager (RM) find it needs 10 systems (48GB RAM each)3. RM identifies 10 available nodes from the environment4. RM requests vSMP Foundation for Cloud for a 10-node image5. RM points the 10 nodes to boot vSMP Foundation for Cloud image (using PXE)6. All 10 nodes PXE-boot the image and aggregate to form a single virtual system7. The aggregated Virtual Machine, boots Linux OS from either local drive or PXE- boot (using vendor tag ScaleMP)8. The virtual system now operational and ready to run jobs (up 80-cores, ~400GB)9. RM runs the job10.Job finished – 10 nodes back to the pool and available for other jobs 13-Sep-10 21 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  22. 22. Storage 13-Sep-10 22 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  23. 23. 3.2SIMPLICITY 13-Sep-10 23 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  24. 24. Customer Use CasesENGINEERING FACULTY• Customer: Auburn University, School of Engineering• Current platform: None. Just getting into HPC• Problems: – Compute requirements were growing as number of users/students was growing – No in-house skills to run x86 InfiniBand cluster – Limited operational budget to hire additional sys-admin resources• Applications: Commercial application (mostly Fluent and MATLAB)• Solution: – 4 full blade chassis, each aggregated as a single system with 128 cores and 384 GB RAM and 5 TB of internal storage – Total: 64 physical nodes, 512 cores, 20TB storage - running as a cluster of 4 ‘Fat-Nodes’• Benefits: – Low OPEX: LARGE SCALE • • No additional IT required for day-to-day operations The need to manage only 4 ‘Fat-Nodes’ DEPLOYMENT • Internal storage is embedded in each ‘Fat-Node’ WITHOUT THE – Simplicity: InfiniBand performance without the complexity of managing such a solution COMPLEXITY 13-Sep-10 24 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  25. 25. Customer Use CasesENGINEERING SERVICES COMPANY• Customer: Mid-size Engineering Services Company• Current platform: Multiple 2-socket workstations• Problems: – Existing models (Abaqus) grow fast and can’t fit the engineers workstation – Interested in running apps in batch at night – No in-house skills to run x86 InfiniBand cluster (although the application runs nicely on InfiniBand cluster) . Can’t afford RISC systems• Solution: – 4 Intel dual-processor Xeon systems to provide 128GB RAM, 8 sockets (16 cores) single virtual system running Linux with vSMP Foundation• Benefits: – Performance: Solution significantly faster than existing workstations. Performance is comparable to cluster performance (using vendor benchmarks). – Low OPEX: No IT required for day-to-day operation – Versatility: Batch mode at night. Daytime jobs are executed on the system INNOVATION while using the workstation for display only. Multi-user environment with perfect scaling – and sharing without performance degradation. WITHOUT – Investment protection: Expected to expand the system by adding additional 4 nodes (to a total of 256GB RAM, 32 cores) COMPLEXITY 13-Sep-10 25 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  26. 26. Customer Use CasesFINANCIAL SERVICES• Customer: Hedge Fund• Current platform: Multiple 4-Socket Servers• Problems: – A single 4-socket server did not provide enough performance required for customer business targets – Multiple 4-socket servers required complex decomposition and introduced challenges in transferring data between processes in a short and deterministic time (low latency and small jitters) • Ethernet based solution could not provide this / IB solution is too complex to manage and program for – Co-location at exchanges for a solution comprised of multiple systems is complicate• Applications: KX, WOMBAT, home-grown code• Solution: – 16 Intel dual-processor Xeon systems to provide 0.5TB RAM, 32 sockets (128 cores) single virtual system running Linux with vSMP Foundation• Benefits: – Reduced latency and latency variance – Simpler solution: Deploy and management of a single system SIMPLIFYING – Better utilization: Single system reduces resources fragmentation INTER-PROCESS – Simpler programming model: No need for specific InfiniBand programming COMMUNICATION 13-Sep-10 26 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  27. 27. 3.3FLEXIBILITY 13-Sep-10 27 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  28. 28. Customer Use CasesHOSTED HPC RESOURCE PROVIDER• Customer: Hosted HPC resource provider• Current platform: Clusters and SMP machines• Problems: – Need to run MPI as well as OpenMP (shared memory) codes – Large shared memory jobs require dedicated proprietary hardware – Low utilization on shared memory systems• Applications: A variety of commercial codes• Solution: – Original: 4 systems, total of 8 sockets (32 cores) and 128GB RAM – Solution was extended to 16 nodes• Benefits: – Utilization: Rely on standard commodity hardware – Flexibility: Using same system for both shared memory and cluster benchmarks, resulting in high utilization COST EFFECTIVE FLEXIBLE SOLUTION WITH HIGH UTILIZATION 13-Sep-10 28 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  29. 29. Customer Use CasesSUPER COMPUTER CENTER• Customer: San Diego Supercomputer Center (SDSC)• Current platform: 8-socket AMD systems• Problems: – Require an infrastructure for data intensive computing – Need large memory system (TBs in size), depending on job need – Require the ability to access quickly large amounts of storage• Applications: A variety of data intensive codes (Astronomy, Genomics, Data Mining, etc..)• Solution: – Initial Deployment: 4 ‘Super Nodes’, each with 768GB RAM, 128 cores, 10TB internal storage – Complete Deployment (2011): 1,024 servers with vSMP Foundation for Cloud. Could be aggregated up to 32 ‘Super Nodes’ each nodes is 32 servers, resulting in 2TB RAM and 8TB of SSDs – On demand allocation using web-request and fast (<10 minutes) provisioning• Benefits: – Flexibility: Provision multiple ’Super Nodes’ of various ELASTIC VM SOLUTION sizes according to need – Performance: Extremely fast hierarchical memory AIMED FOR DATA solution: RAM  Aggregated RAM  Aggregated SSDs INTENSIVE COMPUTING 13-Sep-10 29 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  30. 30. 3.4CAPABILITY 13-Sep-10 30 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  31. 31. Customer Use CasesGLOBAL ENERGY COMPANY• Customer: Global Energy Company• Current platform: x86 grid• Problems: – Using in-house single-threaded simulation tools in throughput mode. Each simulation memory footprint has grown over the years and sometimes (10%) exceeds 32GB. – Application runs on x86 only – Used to reschedule failed runs on large-memory systems• Solution: – 6 Intel dual-processor Xeon systems to provide 192GB RAM, 12 sockets (48 cores) single virtual system running Linux with vSMP Foundation• Benefits: – Versatility: Both large and small workloads used concurrently on the same system – Utilization: Higher utilization compared to grid due to lower infrastructure fragmentation – Investment protection: Solution expanded by 100% since initial installation SINGLE INFRASTRUCTURE FOR HORIZONTAL AND VERTICAL APPLICATION SCALING – PLUG & PLAY 13-Sep-10 31 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  32. 32. Customer Use CasesFORMULA1 TEAM• Customer: Formula1 team• Current platform: Large-memory Itanium-based system• Problems: – Need to generate large mesh as part of pre-processing of whole-car simulation (FLUENT TGrid) – Mesh requirements are ~200GB in size – Expect to grow significantly within 12 months after initial deployment – Would like to standardize on x86 architecture due to lower costs and open standards• Solution: – 12 Intel dual-processor Xeon systems to provide 384GB RAM single virtual system running Linux with vSMP Foundation• Benefits: – Better performance: Solution evaluated and found to be faster than alternative systems (x86 and non-x86) – Cost: Significant savings compared to alternative system SCALEUP – Versatility: Also being used to run FLUENT (MPI) as part of large cluster – Investment protection: Solution can grow AT SCALEOUT PRICING 13-Sep-10 32 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  33. 33. Customer Use CasesWEATHER FORECASTING SERVICE PROVIDER• Customer: Weather forecasting service provider• Current platform: SGI Altix with 32 cores• Problems: – Need to run MPI as well as OpenMP codes – System needs to be deployed remotely, and hence needs to be simple to manage – Data processing flow is complex and requires transferring large amounts of data between steps• Applications: MM5, WRF, MAWSIP, Home-grown code for data transformation• Solution: – 4 Intel Nehalem dual socket blades, total of 8 sockets (32 cores) and 192GB RAM – Internal storage – Solution was extended to 8 blades, total of 16 sockets (64 cores) and 384GB RAM• Benefits: – Performance: 2.5 X better performance on same # of cores (32) SIMPLE AND – Simpler solution: Significantly reduced capital expense, allowed FLEXIBLE COST the customer to have a higher # of cores – Simplicity: Simple to manage by domain experts (weather EFFECTIVE forecast scientists) – Dataflow remains within the system, leveraging internal storage SOLUTION 13-Sep-10 33 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  34. 34. Customer Use CasesSEMICONDUCTOR MANUFACTURER• Customer: Large European semiconductor manufacturer• Current platform: Proprietary large shared memory system + compute grid• Problems: – Dedicated and proprietary systems were expensive – Utilization and efficiency of the existing large memory system was low for throughput jobs – The mix of throughput and large memory jobs required maintaining 2 separate environments• Solution: – 8 Intel dual-socket (quad-core) Xeon systems to provide 300GB RAM single virtual system running Linux with vSMP Foundation• Benefits: – The ability to run large memory jobs when required – Flexibility: Switch to between large memory and throughput jobs in an efficient way on the fly – Consistency: Underlying hardware is aligned with standard hardware used for the rest of the compute grid – Better utilization: Having a single system reduces resources fragmentation – Performance: Leverage most recent Intel CPUs for large-memory jobs FLEXIBLE SYSTEM FOR THROUGHPUT AND LARGE MEMORY JOBS 13-Sep-10 34 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  35. 35. Customer Use CasesMEDICAL RESEARCH INSTITUTE• Customer: Medical Research Institute• Current platform: HP Superdome System• Problems: – Need to perform high performance image processing on very large MRI scans – Scanned data for a single run is currently over 200GB. Memory requirements are expected to grow significantly with the introduction of full body scan with more sensors – Would like to use and commercial tools for faster development – Would like to standardize on x86 architecture due to lower costs and open standards• Applications: Siemens CT processing, MATLAB, BLAS, Home-grown code, …• Solution: – 16 Intel dual-processor Xeon systems to provide 1TB RAM, 32 sockets (128 cores) single virtual system running Linux with vSMP Foundation• Benefits: – Better performance: Solution evaluated and found to be faster than any other alternative system LARGE MEMORY FOR – Cost: Significant savings compared to alternative system (order MULTI-THREADED of magnitude) – Versatility: Also being used for MPI jobs as part of large cluster PROGRAMMING 13-Sep-10 35 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  36. 36. Customer Use CasesACADEMIC RESEARCH• Customer: RWTH Aachen - Polytechnic University• Current platform: 4 socket x86 server• Problems: – Need to run auto-parallelized code (using OpenMP) with high core count – Other solution found not to work: • Cannot afford proprietary SMP • Cluster-OpenMP proved not to scale – Has to use OpenMP for faster development (machine generated Fortran code)• Applications: Home-grown codes (SHEMAT Suite, FIRE, …)• Solution: – 13 Intel dual-processor Xeon systems to provide 26 sockets (104 cores) – Single virtual system running Linux with vSMP Foundation• Benefits: – Performance: Solution scales at over 95% efficiency to 104 cores with OpenMP codes SHARED MEMORY – Cost: Significant savings compared to alternative solutions MULTI-THREADED – System scale: Largest x86 system available PROGRAMMING 13-Sep-10 36 Aggregate. Scale. Simplify. Save. Confidential and Proprietary
  37. 37. THE HIGH-END VIRTUALIZATION COMPANY SERVER AGGREGATION – CREATING THE POWER OF ONEShai FultheimFounder and PresidentShai@ScaleMP.com, +1 (408) 480 1612 Aggregate. Scale. Simplify. Save.

×