Deploying and Managing HPC Clusters with IBM Platform and Intel Xeon Phi Coprocessor


Published on

IBM and Xeon Phi look to overcome infrastructure limitations

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Deploying and Managing HPC Clusters with IBM Platform and Intel Xeon Phi Coprocessor

  1. 1. Best Practices in Deploying and ManagingHPC Clusters with Intel® Xeon Phi™Louise WestobyWW Marketing Manager, IBM Platform ComputingJune 18, 2013
  2. 2. Business Innovation Stressing ITObjective: Gain competitive advantage• Innovate with more complex applications / simulations / analytics• Long processing limits number of iterations in a given time period• Explosion of data improves results but adds complexity• Delays and high cost of adding new applications• Difficult to use systemsEnd Users /BusinessObjective: Reduce cost while maintaining service• Infrastructure silos to meet peak service level requirements• CapEx and OpEx budget growth constrained• Infrastructure issues – power/cooling, space, etc.• Rise of lower cost resources (x86) and virtualization• Evolving trend toward heterogeneous, multi-core programming modelsITOrganizations
  3. 3. VIRTUALIZED VIEWOF COMPUTE,NETWORKAND STORAGERESOURCESApplicationBusinesses need to overcome infrastructure limitations tomaximize the value of compute and data-intensive applicationsApplicationExamples• Simulation• Analysis• Design• Big dataIT constrained• Long wait times• Low utilization• IT SprawlIBM Platform ComputingSoftwareBig Data / HadoopSimulation andModelingAnalyticsToday FutureMake lots ofcomputers look like“one”Prioritized matchingof supply withdemandBenefits• High utilization• Throughput• Performance• Prioritization• Reduced costRepeated formanyapplicationsand groups• Clusters• Grid• HPC CloudVIRTUALIZED VIEWOF COMPUTE,NETWORKAND STORAGERESOURCESFaster timeto resultsUse fewerresourcesHPC Cloud / ClusterMgmt
  4. 4. Complete range of technical computing management software tomaximize high performance applicationsWorkload andResourceManagementDataManagementInfrastructureManagementPlatform LSF FamilyBatch, MPI workloads with processmgmt, monitoring, analytics, userportal, license mgmtPlatform HPCSimplified, integrated HPCmanagement software for batch, MPIworkloads integrated with systemsPlatform Symphony FamilyHigh throughput, near ‘real time’parallel compute and Big Data /MapReduce workloadsBig Data /HadoopSimulation /ModelingAnalyticsApplicationsHeterogeneousResourcesCompute Storage NetworkVirtual, Physical, Desktop, Server, CloudPlatform Cluster Manager FamilyProvision and manageSingle Cluster (Standard) to Dynamic Clouds (Advanced)General Parallel File System (GPFS)High performance, distributed parallel file system
  5. 5. System X and Platform Computing: better togetherReference Ecosystem – Leverage the tight integration between IBM System x,Platform Computing software and Intel technologyRHEL MSSystem XApp App AppQ LogicInfiniBandIntelXeonIntelXeon PhiIntelIntel ClusterReadyIBM PlatformComputing
  6. 6. Leveraging Platform HPC to properly provision and configureXeon Phi environmentAdd Intel MPSSpackages to therepositoryCreate provisioningtemplate to includeMPSS packageProvision all nodeswith Xeon Phi cardsGenerate MPSSconfiguration onnodes with Xeon PhiCreate networkbridge & configureXeon Phi networkStart mpss serviceautomatically onsystem boot up1. Provision nodes and install MPSS2. Install Intel® Xeon Phi ®compilers and run time software3. Configure Platform HPC ELIM
  7. 7. Levering Platform LSF or Platform HPC to simplifyscheduling of Intel® Xeon Phi™ jobs• Job can be submitted by specifying the followingmetrics:– Number of Xeon Phi cards required on each node– Any metrics the Xeon Phi ELIM collects• Job will be placed on nodes with available Xeon Phicards that meet the resource requirements– Numerate Xeon Phi card on a node allowing multiplejobs running on the same node using designated cards• Agnostic to Xeon Phi execution mode (offload, native,etc.)• Job information– Indication of which Xeon Phi cards are usedCollecting Xeon Phi Metrics• Total number of cards pernode• Number of cores peraccelerator• Core temperature (Celsius)• Frequency (GHz)• Total power (Watts)• Total Free memory (MB)
  8. 8. Cluster NodePlatform HPC monitoring system• Single agent for both resource monitoringand resource management• Based on 20 years of Platform technology– Light weight and small footprint– Scalable– Robust– Extendable– Fully automated failover• Added monitoring metrics shown inPlatform HPC web GUI automatically• Added monitoring metrics can be used todefine alertsLIMXeon PhiELIMGPU ELIMOtherELIMsManagement NodeMasterLIM PERF:Monitoring &ReportingMasterScheduler
  9. 9. Mudpot: Intel® Xeon Phi™ Cluster is used for advancedcomputing at the NCAR Wyoming Supercomputing Center9
  10. 10. IBM Platform LSF Leveraged at NCAR to manage complex,heterogeneous compute environment• From user POV there is oneplace to submit jobs, regardlessof resource• Different queues depending onjob type (e.g. regular, bigmem,gpgpu)• Allows multistage jobs to run onmultiple resources– Large model run onYellowstone– Dependent Data-Analysis Runon Geyser• Sharing between projectsmanaged transparently10
  11. 11. Thank you!