Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1 LifeSciencesRiskAnalysisSimulationBigDataAnalyticsHighPerformanceComputing
HIGH PERFORMANCE
COMPUTING 2015/16
Technology...
N
E
SE
SW
NE
NWMore Than 30 Years of Experience
in Scientific Computing
3
1980markedthebeginningofadecadewherenumerousstartups
were created, some of which later transformed into big players in
t...
High Performance Computing
for Everyone
Director of HPC Solutions at transtec AG, discussing
the wide area of use of High ...
5
Dr. Tennert, in which industries have you seen an increase in
demand for high performance computing?
Fromourperspective,...
6
High Performance Computing
for Everyone
Interview with Dr. Oliver Tennert
such as transtec from a product-centric hardwa...
7
For example, with the use of biochemi-cal methods in “next
generation sequencing”, complete genome sequences can
today b...
8
Technology Compass
Table of Contents and Introduction
High Performance Computing...........................................
9
Intel Cluster Ready ................................................................. 84
A Quality Standard for HPC Clus...
High Performance Computing (HPC) has been with us from the very beginning of the
computer era. High-performance computers ...
12
Performance Turns Into Productivity
HPC systems in the early days were much different from those
we see today. First, w...
S
EW
SE
SW tech BOX
13
Variations on the theme: MPP and SMP
Parallel computations exist in two major variants today. Ap-
p...
BOX
14
High Performance Computing
Flexible Deployment With xCAT
xCAT as a Powerful and Flexible Deployment Tool
xCAT (Extr...
15
Local Installation or Diskless Installation
We offer a diskful or a diskless installation of the cluster nodes.
A diskl...
BOX
individual Presales
consulting
application-,
customer-,
site-specific
sizing of
HPC solution
burn-in tests
of systems
b...
17
more if necessary. We make sure that any hardware shipped meets
ourandourcustomers’qualityrequirements.transtecHPCsolut...
Initially, in the 80s and 90s, the notion of supercomputing was almost a synonym for
vector-computing. And this concept is...
20
History of HPC, and what we learn
During the last years systems based on x86 architecture domi-
nated the HPC market, w...
21
targeted to commercial sectors, or in the case of Intel, AMD or
Motorola to PCs, to products which were successfully in...
22
use these many cores is another question, in principle the poten-
tial performance is there.
The “standard benchmark” i...
23
CPU which counts, it is the speed with which the CPU can
exchange data with the memory, the so-called memory band-
widt...
24
Alternative Architectures?
During recent years GPUs and many-core architectures are in-
creasingly used in the HPC mark...
S
EW
SE
SW tech BOX
25
History of the SX Series
As already stated NEC started marketing the first commercially
available g...
26
allowed connecting several of these strong nodes into one highly
powerful system. The IXS featured advanced functions l...
27
At the same time NEC had a lot of experience in mixing MPI- and
OpenMP parallelization in applications, and it turned o...
28
Memory bandwidth is increasingly a problem, this is known in
the market as the “memory wall”, and is a hot topic since ...
29
NEC already installed several SX-ACE systems in Europe, and
more installations are ongoing. The advantages of the syste...
S
EW
SE
SW
tech BOX
do i = 1, n
a(i) = b(i) + c(i)
end do
30
Vector Supercomputing
History of the SX Series
LSI technologi...
do i = 1, n-1
a(i+1) = a(i) + c(i)
end do
31
The time or number of cycles extends to the right hand side,
assuming five st...
32
given loop iteration. But typically operations in such a loop body
need results from previous operations of the same it...
33
iterations are independent, means if the loop is data parallel. It
would not work if one computation would need a resul...
High Performance Computing (HPC) clusters serve a crucial role at research-intensive
organizations in industries such as a...
36
The Bright Advantage
Bright Cluster Manager for HPC makes it easy to build and oper-
ate HPC clusters using the server ...
37
for monitoring tools such as Ganglia or Nagios are required. This
minimizes the overall load of cluster management soft...
S
EW
SE
SW
tech BOX
38
Advanced Cluster
Management Made Easy
Easy-to-use, complete and scalable Head Nodes  Regular Nodes
...
39
All types of nodes (head nodes and regular nodes) run the same
CMDaemon. However, depending on the role that has been a...
40
 Can Install on Bare Metal – With Bright Cluster Manager there
is nothing to pre-install. You can start with bare metal...
41
Quickly build and deploy an HPC cluster
When you need to get an HPC cluster up and running, don’t waste
time cobbling t...
BOX
42
Advanced Cluster
Management Made Easy
Easy-to-use, complete and scalable High performance meets efficiency
Initiall...
S
EW
SE
SW tech BOX
43
Bright Cluster Manager unleashes and manages
the unlimited power of the cloud
Create a complete clu...
44
Advanced Cluster
Management Made Easy
Cloud Bursting With Bright
Scenario 2: Cluster Extension
The Cluster Extension sc...
Bright Computing
45
Bright Cluster Manager also provides additional features that are
unique to the cloud.
Amazon spot ins...
While all HPC systems face challenges in workload demand, workflow constraints,
resource complexity, and system scale, ent...
48
Solutions to Enterprise
High-Performance Computing Challenges
The Enterprise has requirements for dynamic scheduling, p...
S
EW
SE
SW tech BOX
49
Intelligent HPC Workload Management Across
Infrastructure and Organizational Complexity
Moab HPC Su...
50
 Elastic Computing – unifies data center resources by assisting
the business management resource expansion through burs...
51
 OpenStack integration to offer virtual and physical resource
provisioning for IaaS and PaaS
 Performance boost to achi...
52
The Enterprise benefits from these features:
 Intelligent resource placement prevents job failures with
granular resour...
53
Grid- and Cloud-Ready Workload Management
The benefits of a traditional HPC environment can be extended
to more efficie...
S
EW
SE
SW
tech BOX
54
Choose the Moab HPC Suite That’s Right for You
This following chart provides a high-level compariso...
55
 Application Portal Edition is suited for customers who need an
application-centric job submission experience to simpli...
BOX
56
transtec HPC solutions are designed for maximum flexibility
and ease of management. We not only offer our customers...
57
Moab HPC Suite Use Case Basic Enterprise Application Portal Remote Visualization
ProductivityAcceleration
Massive multi...
58
Intelligent HPC
Workload Management
Moab HPC Suite –
Application Portal Edition
Moab HPC Suite – Application Portal Edi...
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Upcoming SlideShare
Loading in …5
×

Hpc kompass 2015

565 views

Published on

The new 222 pages transtec HPC Kompass 2015 is available for download now!

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Hpc kompass 2015

  1. 1. 1 LifeSciencesRiskAnalysisSimulationBigDataAnalyticsHighPerformanceComputing HIGH PERFORMANCE COMPUTING 2015/16 Technology Compass
  2. 2. N E SE SW NE NWMore Than 30 Years of Experience in Scientific Computing
  3. 3. 3 1980markedthebeginningofadecadewherenumerousstartups were created, some of which later transformed into big players in the IT market. Technical innovations brought dramatic changes to the nascent computer market. In Tübingen, close to one of Ger- many’s prime and oldest universities, transtec was founded. In the early days, transtec focused on reselling DEC computers and peripherals, delivering high-performance workstations to university institutes and research facilities. In 1987, SUN/Sparc and storage solutions broadened the portfolio, enhanced by IBM/RS6000 products in 1991. These were the typical worksta- tions and server systems for high performance computing then, used by the majority of researchers worldwide. In the late 90s, transtec was one of the first companies to offer highly customized HPC cluster solutions based on standard Intel architecture servers, some of which entered the TOP 500 list of the world’s fastest computing systems. Thus, given this background and history, it is fair to say that tran- stec looks back upon a more than 30 years’ experience in scientif- ic computing; our track record shows more than 750 HPC instal- lations. With this experience, we know exactly what customers’ demands are and how to meet them. High performance and ease of management – this is what customers require today. HPC systems are for sure required to peak-perform, as their name indicates, but that is not enough: they must also be easy to han- dle. Unwieldy design and operational complexity must be avoid- ed or at least hidden from administrators and particularly users of HPC computer systems. This brochure focusses on where transtec HPC solutions excel. transtec HPC solutions use the latest and most innovative tech- nology. Bright Cluster Manager as the technology leader for uni- fied HPC cluster management, leading-edge Moab HPC Suite for job and workload management, Intel Cluster Ready certification as an independent quality standard for our systems, Panasas HPC storage systems for highest performance and real ease of man- agement required of a reliable HPC storage system. Again, with these components, usability, reliability and ease of management are central issues that are addressed, even in a highly heteroge- neous environment. transtec is able to provide customers with well-designed, extremely powerful solutions for Tesla GPU com- puting, as well as thoroughly engineered Intel Xeon Phi systems. Intel’s InfiniBand Fabric Suite makes managing a large InfiniBand fabric easier than ever before, and Numascale provides excellent Technology for AMD-based large-SMP Systems – transtec mas- terly combines excellent and well-chosen components that are already there to a fine-tuned, customer-specific, and thoroughly designed HPC solution. Your decision for a transtec HPC solution means you opt for most intensive customer care and best service in HPC. Our experts will be glad to bring in their expertise and support to assist you at any stage, from HPC design to daily cluster operations, to HPC Cloud Services. Last but not least, transtec HPC Cloud Services provide custom- ers with the possibility to have their jobs run on dynamically pro- vided nodes in a dedicated datacenter, professionally managed and individually customizable. Numerous standard applications like ANSYS, LS-Dyna, OpenFOAM, as well as lots of codes like Gromacs, NAMD, VMD, and others are pre-installed, integrated into an enterprise-ready cloud management environment, and ready to run. Have fun reading the transtec HPC Compass 2015/16!
  4. 4. High Performance Computing for Everyone Director of HPC Solutions at transtec AG, discussing the wide area of use of High Performance Comput- ing (HPC), the benefits of turnkey solutions and the increasing popularity of Big Data Analytics. Interview with Dr. Oliver Tennert
  5. 5. 5 Dr. Tennert, in which industries have you seen an increase in demand for high performance computing? Fromourperspective,itisdifficulttodifferentiatebetweenspecific segments.Ingeneral,thehighperformancecomputingmarketisa globallyexpandingsector.TheUSAmericanITanalystIDCpredicts an annual average growth of approximately 7% over the next 5 years across all sectors even in traditional HPC-dominated indus- tries such as CAE (Computer-Aided Engineering) and Life Sciences. For which type of applications is HPC usually deployed? Could you give us a concrete example of its application? High performance computing is basically divided into two areas of application: simulation on the one hand and data analysis on the other. Take the automotive industry for example. During the vehicle design and development stage, a great deal of effort is invested in crash simulations while the number of ma- terial-intensive crash tests, which used to be performed regular- ly, have been minimized and are only conducted for verification purposes. The analysis of sequenced genome data for hereditary diseases is a prime example of how data analysis is used effectively – or from the field of molecular genetics – the measurement of evolu- tionary relations between different types or the effect of viruses on the germline. High performance computing is often criticized under the as- pect that a HPC installation is far too expensive and complex for “ordinary” organisations. Which arguments would you use against these accusations? [Grinning:] In the past we have sold HPC solutions to many cus- tomers who would describe themselves as “ordinary” and who have been extremely satisfied with the results. But you are in fact right: when using a complex HPC system, it involves hiding this complexity away from users and administrators. When you take a look under the hood of a car, you will see that it is also a complex machine. However drivers do not have to concern them- selves with features such as transmission technology and ABS electronics when driving the vehicle. It is therefore important to ensure you implement a straightfor- ward and enterprise-ready management and service concept. transtec offers an intelligent portfolio of service and manage- ment concepts which are compatible with the paradigms of “easy to manage” and “easy to use”. What are the challenges when launching, installing and operat- ing an HPC system? What should administrators bear in mind to ensure that the HPC system remains manageable? When looking for an HPC systems provider, customers should make sure that the provider has precisely understood their needs and requirements. This is what differentiates a solutions provider
  6. 6. 6 High Performance Computing for Everyone Interview with Dr. Oliver Tennert such as transtec from a product-centric hardware supplier or manufacturer. The optimum sizing of the entire solution and the above mentioned easy manageability has a direct impact on the customer’s or user’s productivity. Once the system has been delivered, customers want to start pro- ductive operation as soon as possible. transtec provides custom- ers with turnkey HPC solutions, which means: depending on the customer’srequirements,ourspecialistscanmanagethecomplete installation and integration into the customer’s environment. What is the role of HPC when deploying cloud computing and big data analyses in large organisations? When deploying cloud computing, it is important to make a dis- tinction: that what we today call a “private cloud” is essentially – leaving technological details to one side – a tried-and-tested concept in which companies can consolidate and provide ac- cess to their computing capacities from one central location. The use of “public cloud” providers such as Amazon has signifi- cantly dropped in light of previous NSA reports. Security aspects have always been a serious obstacle for cloud computing and, in terms of HPC, the situation becomes all the more complicated for an industrial organisation as high performance computing is an essential element in the organisation’s value chain. The data the organisation uses for work is a vital asset and critical to the company. BigDataAnalytics,inotherwordstheabilitytoprocessandanalyse large data volumes, is essentially nothing new. Recently however, the amount of data has increased at a significantly faster pace than the computing capacities available.
  7. 7. 7 For example, with the use of biochemi-cal methods in “next generation sequencing”, complete genome sequences can today be generated and prepared for further analysis at a far faster rate than ever before. The need for HPC solutions has recently become increas- ingly more evident. While we are on the topic of big data: To what extent are high-performance, modern in-memory computing technologies competing with HPC systems? This depends on whether you see these technologies in direct competition to each other! Shared in-memory databases and the associated computing technologies are innovative concepts for the particular application, for example real-time analytics or data mining. We believe that both are contemporary technologies like manyothersthatenabletheanalysisoflargedatavolumesinreal- time. From transtec’s point of view, these are HPC technologies and some of our customers are already using them on HPC solutions we have provided. What are expected of today’s supercomputers? What does the future hold for the HPC segment – and what follows on from Petaflop and Petabyte? Whenwetalkofsupercomputers,wearereferringtothetop500list of the fastest computers in the world. “Flop” is used as a measure of performance, in other words the number of floating point operations per second. “Tianhe-2” from China currently holds the number one position with an amazing 34 Petaflops, in other words 34 billion Flops. However rank 500 scoring 133 Teraflops (133 million Flops) is still an impressive figure. That is rough- ly the equivalent of the consolidated performance of 2,000 well configured PCs. Exabyte follows Petabyte. Over the last few years it has been widely discussed on the HPC market what kind of sys- tem would be needed to manage these data volumes, particularly in terms of capacity, power supply and cooling. Significant technology advancements will expectedly be required before this type of supercomputer can be produced. Sheer mass alone will not be able to manage these volumes. Keyword - security: What is the best way to protect HPC systems in an age of the NSA scandal and ever increasing cybercrime? Basically the same applies to HPC systems as it does to company notebooks and workstations where IT security is concerned: access by unauthorised parties must be prohibited by deploying secure IT structures. As described above, enormous volumes of critical organisation data are stored on an HPC system which is why efficient data management with effective access control are essential in HPC environments. The network connections between the data centre and user workstations must also be secured on a physical level. This especially applies when the above-mentioned “private cloud” scenarios are set up across various locations to allow them to work together regardless of the particular country or time zone. „Petabyte is followed by Exabyte, although before that, several technological jumps need to take place.“
  8. 8. 8 Technology Compass Table of Contents and Introduction High Performance Computing......................................... 10 Performance Turns Into Productivity....................................................12 Flexible Deployment With xCAT................................................................14 Service and Customer Care From A to Z ..............................................16 Vector Supercomputing....................................................... 18 The NEC SX Architecture................................................................................20 History of the SX Series..................................................................................25 Advanced Cluster Management Made Easy............ 34 Easy-to-use, Complete and Scalable.......................................................36 Cloud Bursting With Bright..........................................................................44 Intelligent HPC Workload Management.................... 46 Moab HPC Suite – Enterprise Edition.....................................................48 Moab HPC Suite – Grid Option....................................................................52 Optimizing Accelerators with Moab HPC Suite...............................54 Moab HPC Suite – Application Portal Edition...................................58 Moab HPC Suite – Remote Visualization Edition............................60 Remote Visualization and Workflow Optimization........................................................ 68 NICE EnginFrame: A Technical Computing Portal..........................70 Desktop Cloud Virtualization.....................................................................74 Cloud Computing...............................................................................................78 NVIDIA GRID...........................................................................................................80 Virtual GPU Technology..................................................................................82
  9. 9. 9 Intel Cluster Ready ................................................................. 84 A Quality Standard for HPC Clusters......................................................86 Intel Cluster Ready builds HPC Momentum......................................90 The transtec Benchmarking Center........................................................94 Parallel NFS.................................................................................. 96 The New Standard for HPC Storage........................................................98 Whats´s New in NFS 4.1?............................................................................ 100 Panasas HPC Storage.................................................................................... 102 BeeGFS – A Parallel File System...........................................................116 A Parallel File System.................................................................................... 118 NVIDIA GPU Computing......................................................126 What is GPU Computing?........................................................................... 128 Kepler GK110/210 GPU Computing Architecture......................... 130 A Quick Refresher on CUDA....................................................................... 140 Intel Xeon Phi Coprocessor..............................................142 The Architecture.............................................................................................. 144 An Outlook on Knights Landing and Omni Path.......................... 152 InfiniBand High-Speed Interconnect.........................156 The Architecture.............................................................................................. 158 Intel MPI Library 4.0 Performance......................................................... 160 Intel Fabric Suite 7.......................................................................................... 164 Numascale..................................................................................168 NumaConnect Background....................................................................... 170 Technology.......................................................................................................... 172 Redefining Scalable OpenMP and MPI............................................... 178 Big Data................................................................................................................. 182 Cache Coherence............................................................................................. 184 NUMA and ccNUMA........................................................................................ 186 IBM Platform HPC...................................................................188 Enterprise-Ready Cluster & Workload Management................ 190 What’s New in IBM Platform LSF 8........................................................ 194 IBM Platform MPI 8.1..................................................................................... 202 General Parallel File System (GPFS).............................204 What is GPFS?.................................................................................................... 206 What´s New in GPFS Version 3.5............................................................ 220 Glossary........................................................................................222
  10. 10. High Performance Computing (HPC) has been with us from the very beginning of the computer era. High-performance computers were built to solve numerous problems which the “human computers” could not handle. The term HPC just hadn’t been coined yet. More important, some of the early principles have changed fundamentally. High Performance Computing – Performance Turns Into Productivity LifeSciencesRiskAnalysisSimulationBigDataAnalyticsHighPerformanceComputing
  11. 11. 12 Performance Turns Into Productivity HPC systems in the early days were much different from those we see today. First, we saw enormous mainframes from large computer manufacturers, including a proprietary operating system and job management system. Second, at universities and research institutes, workstations made inroads and scien- tists carried out calculations on their dedicated Unix or VMS workstations. In either case, if you needed more computing power, you scaled up, i.e. you bought a bigger machine. Today the term High-Performance Computing has gained a fundamen- tally new meaning. HPC is now perceived as a way to tackle complex mathematical, scientific or engineering problems. The integration of industry standard, “off-the-shelf” server hardware into HPC clusters facilitates the construction of computer net- works of such power that one single system could never achieve. The new paradigm for parallelization is scaling out. Computer-supported simulations of realistic processes (so-called ComputerAidedEngineering–CAE)hasestablisheditselfasathird key pillar in the field of science and research alongside theory and experimentation. It is nowadays inconceivable that an air- craft manufacturer or a Formula One racing team would operate without using simulation software. And scientific calculations, such as in the fields of astrophysics, medicine, pharmaceuticals and bio-informatics, will to a large extent be dependent on super- computers in the future. Software manufacturers long ago rec- ognized the benefit of high-performance computers based on powerful standard servers and ported their programs to them accordingly. The main advantages of scale-out supercomputers is just that: they are infinitely scalable, at least in principle. Since they are based on standard hardware components, such a supercomput- er can be charged with more power whenever the computational capacity of the system is not sufficient any more, simply by add- ing additional nodes of the same kind. A cumbersome switch to a different technology can be avoided in most cases. High Performance Computing Performance Turns Into Productivity “transtec HPC solutions are meant to provide customers with unpar- alleled ease-of-management and ease-of-use. Apart from that, decid- ing for a transtec HPC solution means deciding for the most intensive customer care and the best service imaginable.” Dr. Oliver Tennert Director Technology Management & HPC Solutions
  12. 12. S EW SE SW tech BOX 13 Variations on the theme: MPP and SMP Parallel computations exist in two major variants today. Ap- plications running in parallel on multiple compute nodes are frequently so-called Massively Parallel Processing (MPP) appli- cations. MPP indicates that the individual processes can each utilize exclusive memory areas. This means that such jobs are predestined to be computed in parallel, distributed across the nodes in a cluster. The individual processes can thus utilize the separate units of the respective node – especially the RAM, the CPU power and the disk I/O. Communication between the individual processes is implement- ed in a standardized way through the MPI software interface (Message Passing Interface), which abstracts the underlying network connections between the nodes from the processes. However, the MPI standard (current version 2.0) merely requires source code compatibility, not binary compatibility, so an off-the- shelf application usually needs specific versions of MPI libraries in order to run. Examples of MPI implementations are OpenMPI, MPICH2, MVAPICH2, Intel MPI or – for Windows clusters – MS-MPI. If the individual processes engage in a large amount of commu- nication, the response time of the network (latency) becomes im- portant. Latency in a Gigabit Ethernet or a 10GE network is typi- cally around 10 µs. High-speed interconnects such as InfiniBand, reduce latency by a factor of 10 down to as low as 1 µs. Therefore, high-speed interconnects can greatly speed up total processing. The other frequently used variant is called SMP applications. SMP, in this HPC context, stands for Shared Memory Processing. It involves the use of shared memory areas, the specific imple- mentation of which is dependent on the choice of the underlying operating system. Consequently, SMP jobs generally only run on a single node, where they can in turn be multi-threaded and thus be parallelized across the number of CPUs per node. For many HPC applications, both the MPP and SMP variant can be chosen. Many applications are not inherently suitable for parallel execu- tion. In such a case, there is no communication between the in- dividual compute nodes, and therefore no need for a high-speed network between them; nevertheless, multiple computing jobs can be run simultaneously and sequentially on each individual node, depending on the number of CPUs. In order to ensure optimum computing performance for these applications, it must be examined how many CPUs and cores de- liver the optimum performance. We find applications of this sequential type of work typically in the fields of data analysis or Monte-Carlo simulations.
  13. 13. BOX 14 High Performance Computing Flexible Deployment With xCAT xCAT as a Powerful and Flexible Deployment Tool xCAT (Extreme Cluster Administration Tool) is an open source toolkit for the deployment and low-level administration of HPC cluster environments, small as well as large ones. xCAT provides simple commands for hardware control, node discovery, the collection of MAC addresses, and the node deploy- ment with (diskful) or without local (diskless) installation. The cluster configuration is stored in a relational database. Node groups for different operating system images can be defined. Also, user-specific scripts can be executed automatically at in- stallation time. xCAT Provides the Following Low-Level Administrative Features Remote console support Parallel remote shell and remote copy commands Plugins for various monitoring tools like Ganglia or Nagios Hardware control commands for node discovery, collecting MAC addresses, remote power switching and resetting of nodes Automatic configuration of syslog, remote shell, DNS, DHCP, and ntp within the cluster Extensive documentation and man pages For cluster monitoring, we install and configure the open source tool Ganglia or the even more powerful open source solution Na- gios, according to the customer’s preferences and requirements.
  14. 14. 15 Local Installation or Diskless Installation We offer a diskful or a diskless installation of the cluster nodes. A diskless installation means the operating system is hosted par- tially within the main memory, larger parts may or may not be included via NFS or other means. This approach allows for de- ploying large amounts of nodes very efficiently, and the cluster is up and running within a very small timescale. Also, updating the cluster can be done in a very efficient way. For this, only the boot image has to be updated, and the nodes have to be reboot- ed. After this, the nodes run either a new kernel or even a new operating system. Moreover, with this approach, partitioning the cluster can also be very efficiently done, either for testing pur- poses, or for allocating different cluster partitions for different users or applications. Development Tools, Middleware, and Applications According to the application, optimization strategy, or underly- ing architecture, different compilers lead to code results of very different performance. Moreover, different, mainly commercial, applications, require different MPI implementations. And even when the code is self-developed, developers often prefer one MPI implementation over another. According to the customer’s wishes, we install various compilers, MPI middleware, as well as job management systems like Para- station, Grid Engine, Torque/Maui, or the very powerful Moab HPC Suite for the high-level cluster management.
  15. 15. BOX individual Presales consulting application-, customer-, site-specific sizing of HPC solution burn-in tests of systems benchmarkingof different systems continual improvement software OS installation application installation onsite hardware assembly integration into customer’s environment customer training maintenance, support managedservices individual Presales consulting application-, customer-, site-specific sizing of HPC solution burn-in tests of systems benchmarkingof different systems continual improvement software OS installation application installation onsite hardware assembly integration into customer’s environment customer training maintenance, support managedservices 16 High Performance Computing Services and Customer Care from A to Z transtec HPC as a Service You will get a range of applications like LS-Dyna, ANSYS, Gromacs, NAMD etc. from all kinds of areas pre-installed, integrated into an enterprise-ready cloud and workload management system, and ready-to run. Do you miss your application? Ask us: HPC@transtec.de transtec Platform as a Service You will be provided with dynamically provided compute nodes for running your individual code. The operating sys- tem will be pre-installed according to your requirements. Common Linux distributions like RedHat, CentOS, or SLES are the standard. Do you need another distribution? Ask us: HPC@transtec.de transtec Hosting as a Service You will be provided with hosting space inside a profession- ally managed and secured datacenter where you can have your machines hosted, managed, maintained, according to your requirements. Thus, you can build up your own private cloud. What range of hosting and maintenance services do you need? Tell us: HPC@transtec.de HPC @ transtec: Services and Customer Care from A to Z transtec AG has over 30 years of experience in scientific computing andisoneoftheearliestmanufacturersofHPCclusters.Fornearlya decade,transtechasdeliveredhighlycustomizedHighPerformance clusters based on standard components to academic and industry customersacrossEuropewithallthehighqualitystandardsandthe customer-centric approach that transtec is well known for. Every transtec HPC solution is more than just a rack full of hardware – it is a comprehensive solution with everything the HPC user, owner, and operator need. In the early stages of any customer’s HPC project, transtec experts provide extensive and detailed consulting to the customer – they benefit from expertise and experience. Consulting is followed by benchmarking of different systems with either spe- cificallycraftedcustomercodeorgenerallyacceptedbenchmarking routines; this aids customers in sizing and devising the optimal and detailed HPC configuration. Each and every piece of HPC hardware thatleavesourfactoryundergoesaburn-inprocedureof24hoursor Services and Customer Care from A to Z
  16. 16. 17 more if necessary. We make sure that any hardware shipped meets ourandourcustomers’qualityrequirements.transtecHPCsolutions are turnkey solutions. By default, a transtec HPC cluster has every- thing installed and configured – from hardware and operating sys- temtoimportantmiddlewarecomponentslikeclustermanagement or developer tools and the customer’s production applications. Onsite delivery means onsite integration into the customer’s pro- ductionenvironment,beitestablishingnetworkconnectivitytothe corporate network, or setting up software and configuration parts. transtec HPC clusters are ready-to-run systems – we deliver, you turn the key, the system delivers high performance. Every HPC proj- ect entails transfer to production: IT operation processes and poli- ciesapplytothenewHPCsystem.Effectively,ITpersonnelistrained hands-on, introduced to hardware components and software, with all operational aspects of configuration management. transtec services do not stop when the implementation projects ends. Beyond transfer to production, transtec takes care. transtec offers a variety of support and service options, tailored to the cus- tomer’s needs. When you are in need of a new installation, a major reconfiguration or an update of your solution – transtec is able to support your staff and, if you lack the resources for maintaining the cluster yourself, maintain the HPC solution for you. From Professional Services to Managed Services for daily opera- tionsandrequiredservicelevels,transtecwillbeyourcompleteHPC service and solution provider. transtec’s high standards of perfor- mance, reliability and dependability assure your productivity and completesatisfaction.transtec’sofferingsofHPCManagedServices offercustomersthepossibilityofhavingthecompletemanagement and administration of the HPC cluster managed by transtec service specialists,inanITILcompliantway.Moreover,transtec’sHPConDe- mand services help provide access to HPC resources whenever they need them, for example, because they do not have the possibility of owning and running an HPC cluster themselves, due to lacking infrastructure, know-how, or admin staff. transtec HPC Cloud Services Last but not least transtec’s services portfolio evolves as custom- ers‘ demands change. Starting this year, transtec is able to provide HPCCloudServices.transtecusesadedicateddatacentertoprovide computing power to customers who are in need of more capacity than they own, which is why this workflow model is sometimes called computing-on-demand. With these dynamically provided resources, customers with the possibility to have their jobs run on HPC nodes in a dedicated datacenter, professionally managed and secured, and individually customizable. Numerous standard ap- plications like ANSYS, LS-Dyna, OpenFOAM, as well as lots of codes like Gromacs, NAMD, VMD, and others are pre-installed, integrated into an enterprise-ready cloud and workload management environ- ment, and ready to run. Alternatively, whenever customers are in need of space for host- ing their own HPC equipment because they do not have the space capacity or cooling and power infrastructure themselves, transtec is also able to provide Hosting Services to those customers who’d liketohavetheirequipmentprofessionallyhosted,maintained,and managed. Customers can thus build up their own private cloud! Are you interested in any of transtec’s broad range of HPC related services? Write us an email to HPC@transtec.de. We’ll be happy to hear from you!
  17. 17. Initially, in the 80s and 90s, the notion of supercomputing was almost a synonym for vector-computing. And this concept is back now in different flavors. Now almost every architecture is using SIMD, the acronym for “single instruction multiple data”. Vector Supercomputing – The NEC SX Architecture LifeSciencesRiskAnalysisSimulationBigDataAnalyticsHighPerformanceComputing
  18. 18. 20 History of HPC, and what we learn During the last years systems based on x86 architecture domi- nated the HPC market, with GPUs and many-core systems re- cently making their inroads. But this period is coming to an end, and it is again time for a differentiated complementary HPC- targeted system. NEC will develop such a system. The Early Days When did HPC really start? Probably one would think about the time when Seymour Cray developed high-end machines at CDC and then started his own company Cray Research, with the first productbeingtheCray-1in1976.Atthattimesupercomputingwas very special, systems were expensive, and access was extremely limited for researchers and students. But the business case was obvious in many fields, scientifically as well as commercially. These Cray machines were vector-computers, initially featuring only one CPU, as parallelization was far from being mainstream, even hardly a matter of research. And code development was easier than today, a code of 20000 lines was already considered “complex” at that time, and there were not so many standard codes and algorithms available anyway. For this reason code-de- velopment could adapt to any kind of architecture to achieve op- timal performance. In 1981 NEC started marketing its first vector supercomputer, the SX-2, a single-CPU-vector-architecture like the Cray-1. The SX-2 was a milestone, it was the first CPU to exceed one GFlops of peak performance. The Attack of the “Killer-Micros” In the 1990s the situation changed drastically, and what we see today is a direct consequence. The “killer-micros” took over, new parallel, later “massively parallel” systems (MPP) were built based on micro-processor architectures which were predominantly Vector Supercomputing The NEC SX Architecture
  19. 19. 21 targeted to commercial sectors, or in the case of Intel, AMD or Motorola to PCs, to products which were successfully invading the daily life, partially more than even experts had anticipated. Clearly one could save a lot of cost by using, in the worst case slightly adapting these CPUs for high-end-systems, and by fo- cusing investments on other ingredients, like interconnect tech- nology. At the same time increasingly Linux was adopted and of- fered a cheap generally available OS platform to allow everybody to use HPC for a reasonable price. Thesesystemswerecheaperbecauseoftworeasons:thedevelop- ment of standard CPUs was sponsored by different businesses, and another expensive part, the memory architecture with many CPUsaddressingthesameflatmemoryspace,wasreplacedbysys- tems comprised of many distinct so-called nodes, complete sys- tems by themselves, plus some kind of interconnect technology, which could also be taken from standard equipment partially. The impact on the user was the necessity to code for these “dis- tributed memory systems”. Standards for coding evolved, PVM and later MPI, plus some proprietary paradigms. MPI is dominat- ing still today. In the high-end the special systems were still dominating, be- cause they had architectural advantages and the codes were run- ning with very good efficiency on such systems. Most codes were not parallel at all or at least not well parallelized, let alone on systems with distributed memory. The skills and experiences to generate parallel versions of codes were lacking in those compa- nies selling high-end-machines and also on the customers’ side. So it took some time for the MPPs to take over. But it was inevi- table, because this situation, sometimes called “democratization of HPC”, was positive for both the commercial and the scientific developments. From that time on researchers and students in academia and industry got sufficient access to resources they could not have dreamt of before. The x86 Dominance One could dispute whether the x86 architecture is the most advanced or elegant, undoubtedly it became the dominant one. It was developed for mass-markets, this was the strate- gy, and HPC was a side-effect, sponsored by the mass-market. This worked successfully and the proportion of x86 Linux clus- ters in the Top500 list increased year by year. Because of the mass-markets it made sense economically to invest a lot into the development of LSI technology (large-scale integration). The related progress, which led to more and more transistors on the chips and increasing frequencies, helped HPC applica- tions, which got more, faster and cheaper systems with each generation. Software did not have to adapt a lot between gen- erations and the software developments rather focused initial- ly on the parallelization for distributed memory, later on new functionalities and features. This worked for quite some years. Systems of sufficient size to produce relevant results for typical applications are cheap, and most codes do not really scale to a level of parallelism to use thousands of processes. Some light- house projects do, after a lot of development which is not easily affordable, but the average researcher or student is often com- pletely satisfied with something between a few and perhaps a few hundred nodes of a standard Linux cluster. In the meantime already a typical Intel-based dual-socket-node of a standard cluster has 24 cores, soon a bit more. So paralleliza- tion is increasingly important. There is “Moore’s law”, invented by Gordon E. Moore, a co-founder of the Intel Corporation, in 1965. Originallyitstatedthatthenumberoftransistorsinanintegrated circuit doubles about every 18 months, leading to an exponential increase. This statement was often misinterpreted that the per- formance of a CPU would double every 18 months, which is mis- leading. We observe ever increasing core-counts, the transistors are just there, so why not? That applications might not be able to
  20. 20. 22 use these many cores is another question, in principle the poten- tial performance is there. The “standard benchmark” in the HPC-scene is the so-called HPL, High Performance LINPACK, and the famous Top500 list is built based on this. This HPL benchmark is only one of three parts of the original LINPACK benchmark, this is often forgotten! But the other parts did not show such a good progress in performance, their scalability is limited, so they could not serve to spawn com- petition even between nations, which leads to a lot of prestige. Moreover there are at most a few real codes in the world with similar characteristics as the HPL. In other words: it does not tell a lot. This is a highly political situation and probably quite some money is wasted which would better be invested in brain-ware. Anyway it is increasingly doubtful that LSI technology will ad- vance at the same pace in the future. The grid-constant of Silicon is 5.43 Å. LSI technology is now using 14 nm processes, the Intel Broadwell generation, which is about 25 times the grid-constant, and with 7nm LSIs at the long-term horizon. Some scientists as- sume there is still some room to improve even more, but once the features on the LSI have a thickness of only a few atoms one cannot get another factor of ten. Sometimes the impact of some future “disruptive technology” is claimed … Another obstacle is the cost to build a chip fab for a new genera- tion of LSI technology. Certainly there are ways to optimize both production and cost, and there will be bright ideas, but with cur- rent generations a fab costs at least many billion $. There have been ideas of slowing down the innovation cycle on that side to merely be able to recover the investment. Memory Bandwidth For many HPC applications it is not the compute power of the Vector Supercomputing The NEC SX Architecture
  21. 21. 23 CPU which counts, it is the speed with which the CPU can exchange data with the memory, the so-called memory band- width. If it is neglected the CPUs will wait for data, the perfor- mance will decrease. To describe the balance between CPU performance and memory bandwidth one often uses the ratio between operations and the amount of bytes that can be trans- ferred while they are running, the “byte-per-flop-ratio”. Tradi- tionally vector machines were always great in this regard, and this is to continue as far as current plans are known. And this is the reason why a vector machine can realize a higher fraction of its theoretical performance on a real application than a sca- lar machine. Power Consumption One other but really overwhelming problem is the increasing power consumption of systems. Looking at different genera- tions of Linux clusters one can assume that a dual-socket-node, depending on how it is equipped with additional features, will consume between 350 Watt and 400 Watt. Thisonlyslightlyvariesbetweendifferentgenerations,perhapsit will slightly increase. Even in Europe sites are talking about elec- tricity budgets in excess of 1 MWatt, even beyond 10 MWatts. One also has to cool the system, which adds something like 10-30% to the electricity cost! The cost to run such a system over 5 years will be in the same order of magnitude as the purchase price! The power consumption will grow with the frequency of the CPU, is the dominant reason why frequencies are stagnating, if not de- creasing between product generations. There are more cores, but individual cores of an architecture are getting slower, not faster! This has to be compensated on the application side by increased scalability, which is not always easy to achieve. In the past us- ers easily got a higher performance for almost every application with each new generation, this is no longer the case. Architecture Development SothebasicfactsclearlyindicatelimitationsoftheLSItechnology and they are known since quite some time. Consequently this leads to attempts to get additional performance by other means. And the developments in the market clearly indicate that high- end users are willing to consider the adaptation of codes. To utilize available transistors CPUs with more complex func- tionality are designed, utilizing SIMD instructions or providing more operations per cycle than just an add and a multiply. SIMD is “Single Instruction Multiple Data”. Initially at Intel the SIMD-in- structions provided two results in 64-bit-precision per cycle rath- er than one, in the meantime with AVX it is even four, on Xeon Phi already eight. This makes a lot of sense. The electricity consumed by the “ad- ministration” of the operations, code fetch and decode, configu- ration of paths on the CPU etc. is a significant portion of the pow- er consumption. If the energy is used not only for one operation but for two or even four, the energy per operation is obviously reduced. In essence and in order to overcome the performance-bottleneck the architectures developed into the direction which to a larger extent was already realized in the vector-supercomputers dec- ades ago! There is almost no architecture left without such features. But SIMD-operations require application code to be vectorized. Be- fore code developments and programming paradigms did not care for fine-grain data-parallelism, which is the basis of appli- cability of SIMD. Thus vector-machines could not show their strength,onscalarcodetheycannotbeatascalarCPU.Butstrictly speaking, there is no scalar CPU left, standard x86-systems do not achieve their optimal performance on purely scalar code as well!
  22. 22. 24 Alternative Architectures? During recent years GPUs and many-core architectures are in- creasingly used in the HPC market. Although these systems are somewhat diverse in detail, in any case the degree of parallelism needed is much higher than in standard CPUs. This points into the same direction, a low-level parallelism is used to address the bottlenecks. In the case of Nvidia GPUs the individual so-called CUDA cores, relatively simple entities, are steered by a more complex central processor, realizing a high level of SIMD paral- lelism. In the case of Xeon Phi the individual cores are also less powerful, but they are used in parallel using coding paradigms which operate on the level of loop nests, an approach which was already used on vector supercomputers during the 1980s! So these alternative architectures not only go into the same direction, the utilization of fine-grain parallelism, they also in- dicate that coding styles which suite real vector machines will dominate the future software developments. Note that at last some of these systems have an economic backing by mass mar- kets, gaming! Vector Supercomputing The NEC SX Architecture
  23. 23. S EW SE SW tech BOX 25 History of the SX Series As already stated NEC started marketing the first commercially available generation of this series, the SX-2, in 1981. By now the latest generation is the SX-ACE. In 1998 Prof. Tadashi Watanabe, the developer of the SX series and the major driver of NEC’s HPC-business, received the Eck- hard-Mauchly-medal, a renowned US-award for these achieve- ments,andin2006alsothe“SeymourCrayAward”.Prof.Watanabe was a very pragmatic developer, a knowledgeable scientist and a visionary manager, all at the same time, and a great person to talk to. From the beginning the SX-architecture was different from the competing Cray-vectors, it already integrated some of the features that RISC CPUs at that time were using, but com- bined them with vector registers and pipelines. For example the SX could reorder instructions in hardware on the fly, which Cray machines could not do. In effect this made the system much easier to use. The single processor SX-2 was the fastest CPU at that time, the first one to break the 1 GFlops bar- rier. Naturally the compiler technology for automatic vectoriza- tion and optimization was important. With the SX-3 NEC developed a high-end shared-memory system. It had up to four very strong CPUs. The resolution of memory ac- cess conflicts by CPUs working in parallel, and the consistency of data throughout the whole shared-memory system are non-triv- ial, and the development of such a memory architecture was a major breakthrough. WiththeSX-4NECimplementedahigh-endsystembasedonCMOS. ItwasthedirectcompetitortotheCrayT90,withahugeadvantage with regard to price and power consumption. The T90 was using ECL technology and achieved 1.8 GFlops peak performance. The SX-4 had a lower frequency because of CMOS, but because of the higher density of the LSIs it featured a higher level of parallelism on the CPU, leading to 2 GFlops per CPU with up to 32 CPUs. The memory architecture to support those was a major leap ahead. On this system one could typically reach an efficiency of 20-40% of peak performance measured throughout complete applications, far from what is achieved on standard components today. With this system NEC also implemented the first genera- tion of its own proprietary interconnect, the so-called IXS, which
  24. 24. 26 allowed connecting several of these strong nodes into one highly powerful system. The IXS featured advanced functions like a global barrier and some hardware fault-tolerance, things which other companies tackled years later. With the SX-5 this direction was continued, it was a very suc- cessful system pushing the limits of CMOS-based CPUs and mul- ti-node configurations. At that time slowly the efforts on the us- ers’ side to parallelize their codes using “message passing”, PVM and MPI at that time, started to surface, so the need for a huge shared memory system slowly disappeared. Still there were quite some important codes in the scene which were only parallel us- ing a shared-memory paradigm, initially “microtasking”, which was vendor-specific, “autotasking”, NEC and some other vendors could automatically parallelize code, and later the standard OpenMP. So there was still sufficient need for a huge a highly ca- pable shared-memory-system. But later it became clear that more and more codes could utilize distributed memory systems, and moreover the memory-archi- tecture of huge shared memory systems became increasingly complex and that way costly. A memory is made of “banks”, in- dividual pieces of memory which are used in parallel. In order to provide the data at the necessary speed for a CPU a certain minimum amount of banks per CPU is required, because a bank needs some cycles to recover after every access. Consequently the number of banks had to scale with the number of CPUs, so that the complexity of the network connecting CPUs and banks was growing with the square of the number of CPUs. And if the CPUs became stronger they needed more banks, which made it even worse. So this concept became too expensive. Vector Supercomputing History of the SX Series
  25. 25. 27 At the same time NEC had a lot of experience in mixing MPI- and OpenMP parallelization in applications, and it turned out that for the very fast CPUs the shared-memory parallelization was typi- cally not really efficient for more than four or eight threads. So the concept was adapted accordingly, which lead to the SX-6 and the famous Earth-Simulator, which kept the #1-spot of the Top500 for some years. The Earth Simulator was a different machine, but based on the same technology level and architecture as the SX-6. Thesesystemswerethefirstimplementationeverofavector-CPU on one single chip. The CPU by itself was not the challenge, but the interface to the memory to support the bandwidth needed for this chip, basically the “pins”. This kind of packaging tech- no-logy was always a strong point in NEC’s technology portfolio. TheSX-7wasasomewhatspecialmachine,itwasbasedontheSX-6 technology level, but featured 32 CPUs on one shared memory. The SX-8 then was a new implementation again with some ad- vanced features which directly addressed the necessities of cer- tain code-structures frequently encountered in applications. The communication between the world-wide organization of appli- cation-analysts and the hard- and software developers in Japan hadprovenveryfruitful,theSX-8wasverysuccessfulinthemarket. With the SX-9 the number of CPUs per node was pushed up again to 16 because of certain demands in the market. Just as an in- dication for the complexity of the memory architecture: the system had 16384 banks! The peak-performance of one node was 16 TFlops. Keep in mind that the efficiency achieved on real applications was still in the range of 10% in bad cases, up to 20% or even more, depending on the code. One should not compare this peak perfor- mance with some standard chip today, this would be misleading. The Current Product: SX-ACE In November 2014 NEC announced the next generation of the SX-vector-product, the SX-ACE. Some systems are already in- stalled and accepted, also in Europe. The main design target was the reduction of power-consump- tion per sustained performance, and measurements have prov- en this has been achieved. In order to do so and in line with the observation that the shared-memory parallelization with vector CPUs will be useful on four cores, perhaps eight, but normally not beyond that, the individual node consists of a single-chip- four core-processor, which inherits memory-controllers and the interface to the communication interconnect, a next generation of the proprietary IXS. That way a node is a small board, leading to a very compact de- sign. The complicated network between the memory and the CPUs, which took about 70% of the power-consumption of an SX- 9, was eliminated. And naturally a new generation of LSI-and PCB technology also leads to reductions. With the memory bandwidth of 256 GB/s this machine realizes one “byte per flop”, a measure of the balance between the pure compute power and the memory bandwidth of a chip. In compar- ison nowadays typical standard processors have in the order of one eights of this!
  26. 26. 28 Memory bandwidth is increasingly a problem, this is known in the market as the “memory wall”, and is a hot topic since more than a decade. Standard components are built for a different market, where memory bandwidth is not as important as it is for HPC applications. Two of these node boards are put into a mod- ule, eight such modules are combined in one cage, and four of the cages, i.e. 64 nodes, are contained in one rack. The standard configuration of 512 nodes consists of eight racks. Vector Supercomputing History of the SX Series
  27. 27. 29 NEC already installed several SX-ACE systems in Europe, and more installations are ongoing. The advantages of the system, the superior memory bandwidth, the strong CPU with good effi- ciency, will provide the scientists with performance levels on cer- tain codes which otherwise could not be achieved. Moreover the system has proven to be superior in terms of power consumption per sustained application performance, and this is a real hot top- ic nowadays.
  28. 28. S EW SE SW tech BOX do i = 1, n a(i) = b(i) + c(i) end do 30 Vector Supercomputing History of the SX Series LSI technologies have improved drastically during the last two decades, but the ever increasing power consumption of systems now poses serious limitations. Tasks like fetching in- structions, decoding them or configuring paths on the chip consume energy, so it makes sense to execute these activities not for one operation only, but for a set of such operations. This is meant by SIMD. Data Parallelism The underlying structure to enable the usage of SIMD-instruc- tions is called “data parallelism”. For example the following loop is data-parallel, the individual loop iterations could be executed in any arbitrary order, and the results would stay the same: The complete independence of the loop iterations allows executing theminparallel,andthisismappedtohardwarewithSIMDsupport. Naturally one could also use other means to parallelize such a loop, e.g. for a very long loop one could execute chunks of loop iterations on different processors of a shared-memory machine using OpenMP. But this will cause overhead, so it will not be effi- cient for small trip counts, while a SIMD feature in hardware can deal with such a fine granularity. As opposed to the example above the following loop, a recursion, is not data parallel:
  29. 29. do i = 1, n-1 a(i+1) = a(i) + c(i) end do 31 The time or number of cycles extends to the right hand side, assuming five stages (vertical direction), the red squares indi- cate which stage is busy at the given cycle. Obviously this is not efficient, because white squares indicate transistors which con- sume power without actively producing results. And the com- pute-performance is not optimal, it takes five cycles to compute one result. Similarly to the assembly line in the automotive industry this is what we want to achieve: After the “latency”, five cycles, just the length of the pipeline, we get a result every cycle. Obviously it is easy to feed the pipeline with sufficient independent input data in the case of different loop iterations of a data-parallel loop. If this is not possible one could also use different operations within the execution of a complicated loop body within one Obviously a change of the order of the loop iterations, for ex- ample the special case of starting with a loop-index of n-1 and decreasing it afterwards, will lead to different results. Therefore these iterations cannot be executed in parallel. Pipelining The following description is very much simplified, but explains the basic principles: Operations like add and multiply consist of so-called segments, which are executed one after the other in a so-called pipeline. While each segment acts on the data for some kind of intermediate step, only after the execution of the whole pipeline the final result is available. It is similar to an assembly-line in the automotive industry. Now, the segments do not need to deal with the same operation at a given point in time, they could be kept active if sufficient inputs are fed into the pipeline. Again, in the automotive industry the different stages of the assembly line are working on different cars, and nobody would wait for a car to be completed before starting the next one in the first stage of the production, that way keeping almost all stages inactive. Such a situation would be resembled by the following picture:
  30. 30. 32 given loop iteration. But typically operations in such a loop body need results from previous operations of the same iteration, so one would rather arrive at a utilization of the pipeline like this: Some operations can be calculated almost in parallel, others have to wait for previous calculations to finish because they need input. In total the utilization of the pipeline is better but not optimal. This was the situation before the trend to SIMD computing was revitalized. Before that the advances of LSI technology always provided an increased performance with every new generation of product. But then the race for increasing frequency stopped, and now architectural enhancements are important again. Single Instruction Multiple Data (SIMD) Hardware realizations of SIMD have been present since the first Cray-1, but this was a vector architecture. Nowadays the instruc- tion sets SSE and AVX in their different generations and flavors use it, and similarly for example CUDA coding is also a way to uti- lize data parallelism. The “width” of the SIMD-instruction, the amount of loop itera- tions to execute in parallel by one instruction, is kept relatively small in the case of SSE (two for double precision) and AVX (four for double precision). The implementations of SIMD in stand- ard “scalar” processors typically involve add-and-multiply units which can produce several results per clock-cycle. They also fea- ture registers of the same “width” to keep several input parame- ters or several results. Vector Supercomputing History of the SX Series
  31. 31. 33 iterations are independent, means if the loop is data parallel. It would not work if one computation would need a result of anoth- er computation as input while this is just being processed. In fact the SX-vector-machines always were combining both ideas, the vector-register and parallel functional units, so-called “pipe-sets”, four in this picture: There is another advantage of the vector architecture in the SX, and this proves to be very important. The scalar portion of the CPU is acting independently, and while the pipelines are kept busy with input from the vector registers, the inevitable scalar instructions which come with every loop can be executed. This involves checking of loop counters, updating of address pointers and the like. That way these instructions essentially do not con- sume time. The amount of cycles available to “hide” those is the length of the vector-registers divided by the number of pipe sets. In fact the white space in the picture above can partially be hid- den as well, because new vector instructions can be issued in ad- vance to immediately take over once the vector register with the input data for the previous instruction is completely processed. The instruction-issue-unit has the complete status information of all available resources, specifically pipes and vector registers. In a way the pipelines are “doubled” for SSE, but both are activat- ed by the same single instruction: Vector Computing The difference between those SIMD architectures and SIMD with a real vector architecture like the NEC SX-ACE, the only remaining vector system, lies in the existence of vector registers. In order to provide sufficient input to the functional units the vector registers have proven to be very efficient. In a way they are a very natural match to the idea of a pipeline – or assembly line. Again thinking about the automotive industry, there one would have a stock of components to constantly feed the assembly line. The following viewgraph resembles this situation: When an instruction is issued it takes some cycles for the first result to appear, but after that one would get a new result every cycle, until the vector register, which has a certain length, is completely processed. Naturally this can only work if the loop
  32. 32. High Performance Computing (HPC) clusters serve a crucial role at research-intensive organizations in industries such as aerospace, meteorology, pharmaceuticals, and oil and gas exploration. The thing they have in common is a requirement for large-scale computations. Bright’s solution for HPC puts the power of supercomputing within the reach of just about any organization. Advanced Cluster Management Made Easy EngineeringLifeSciencesAutomotivePriceModellingAerospaceCAEDataAnalytics
  33. 33. 36 The Bright Advantage Bright Cluster Manager for HPC makes it easy to build and oper- ate HPC clusters using the server and networking equipment of your choice, tying them together into a comprehensive, easy to manage solution. Bright Cluster Manager for HPC lets customers deploy complete clusters over bare metal and manage them effectively. It pro- vides single-pane-of-glass management for the hardware, the operating system, HPC software, and users. With Bright Cluster Manager for HPC, system administrators can get clusters up and running quickly and keep them running reliably throughout their life cycle – all with the ease and elegance of a fully featured, en- terprise-grade cluster manager. Bright Cluster Manager was designed by a group of highly expe- rienced cluster specialists who perceived the need for a funda- mental approach to the technical challenges posed by cluster management. The result is a scalable and flexible architecture, forming the basis for a cluster management solution that is ex- tremely easy to install and use, yet is suitable for the largest and most complex clusters. Scale clusters to thousands of nodes Bright Cluster Manager was designed to scale to thousands of nodes. It is not dependent on third-party (open source) software that was not designed for this level of scalability. Many advanced features for handling scalability and complexity are included: Management Daemon with Low CPU Overhead The cluster management daemon (CMDaemon) runs on every node in the cluster, yet has a very low CPU load – it will not cause any noticeable slow-down of the applications running on your cluster. Because the CMDaemon provides all cluster management func- tionality, the only additional daemon required is the workload manager (or queuing system) daemon. For example, no daemons Advanced Cluster Management Made Easy Easy-to-use, complete and scalable The cluster installer takes you through the installation process and offers advanced options such as “Express” and “Remote”.
  34. 34. 37 for monitoring tools such as Ganglia or Nagios are required. This minimizes the overall load of cluster management software on your cluster. Multiple Load-Balancing Provisioning Nodes Node provisioning – the distribution of software from the head node to the regular nodes – can be a significant bottleneck in large clusters. With Bright Cluster Manager, provisioning capabil- ity can be off-loaded to regular nodes, ensuring scalability to a virtually unlimited number of nodes. When a node requests its software image from the head node, the head node checks which of the nodes with provisioning ca- pability has the lowest load and instructs the node to download or update its image from that provisioning node. Synchronized Cluster Management Daemons The CMDaemons are synchronized in time to execute tasks in exact unison. This minimizes the effect of operating system jit- ter (OS jitter) on parallel applications. OS jitter is the “noise” or “jitter” caused by daemon processes and asynchronous events such as interrupts. OS jitter cannot be totally prevented, but for parallel applications it is important that any OS jitter occurs at the same moment in time on every compute node. Built-In Redundancy Bright Cluster Manager supports redundant head nodes and re- dundant provisioning nodes that can take over from each other in case of failure. Both automatic and manual failover configura- tions are supported. Other redundancy features include support for hardware and software RAID on all types of nodes in the cluster. Diskless Nodes Bright Cluster Manager supports diskless nodes, which can be useful to increase overall Mean Time Between Failure (MTBF), particularly on large clusters. Nodes with only InfiniBand and no Ethernet connection are possible too. Bright Cluster Manager 7 for HPC For organizations that need to easily deploy and manage HPC clusters Bright Cluster Manager for HPC lets customers deploy complete clusters over bare metal and manage them effectively. It provides single-pane-of-glass management for the hardware, the operating system, the HPC software, and users. With Bright Cluster Manager for HPC, system administrators can quickly get clusters up and running and keep them running relia- bly throughout their life cycle – all with the ease and elegance of a fully featured, enterprise-grade cluster manager. Features Simple Deployment Process – Just answer a few questions about your cluster and Bright takes care of the rest. It installs all of the software you need and creates the necessary config- uration files for you, so you can relax and get your new cluster up and running right – first time, every time.
  35. 35. S EW SE SW tech BOX 38 Advanced Cluster Management Made Easy Easy-to-use, complete and scalable Head Nodes Regular Nodes A cluster can have different types of nodes, but it will always have at least one head node and one or more “regular” nodes. A regular node is a node which is controlled by the head node and which receives its software image from the head node or from a dedicated provisioning node. Regular nodes come in different flavors. Some examples include: Failover Node – A node that can take over all functionalities of the head node when the head node becomes unavailable. Compute Node – A node which is primarily used for compu- tations. Login Node – A node which provides login access to users of the cluster. It is often also used for compilation and job sub- mission. I/O Node – A node which provides access to disk storage. Provisioning Node – A node from which other regular nodes can download their software image. Workload Management Node – A node that runs the central workload manager, also known as queuing system. Subnet Management Node – A node that runs an InfiniBand subnet manager. More types of nodes can easily be defined and configured with Bright Cluster Manager. In simple clusters there is only one type of regular node – the compute node – and the head node fulfills all cluster management roles that are described in the regular node types mentioned above.
  36. 36. 39 All types of nodes (head nodes and regular nodes) run the same CMDaemon. However, depending on the role that has been as- signed to the node, the CMDaemon fulfills different tasks. This node has a ‘Login Role’, ‘Storage Role’ and ‘PBS Client Role’. The architecture of Bright Cluster Manager is implemented by the following key elements: The Cluster Management Daemon (CMDaemon) The Cluster Management Shell (CMSH) The Cluster Management GUI (CMGUI) Connection Diagram A Bright cluster is managed by the CMDaemon on the head node, which communicates with the CMDaemons on the other nodes over encrypted connections. The CMDaemon on the head node exposes an API which can also be accessed from outside the cluster. This is used by the cluster management Shell and the cluster management GUI, but it can also be used by other applications if they are programmed to ac- cess the API. The API documentation is available on request. The diagram below shows a cluster with one head node, three regular nodes and the CMDaemon running on each node. Elements of the Architecture
  37. 37. 40 Can Install on Bare Metal – With Bright Cluster Manager there is nothing to pre-install. You can start with bare metal serv- ers, and we will install and configure everything you need. Go from pallet to production in less time than ever. Comprehensive Metrics and Monitoring – Bright Cluster Man- ager really shines when it comes to showing you what’s going on in your cluster. It’s beautiful graphical user interface lets you monitor resources and services across the entire cluster in real time. Powerful User Interfaces – With Bright Cluster Manager you don’t just get one great user interface, we give you two. That way, you can choose to provision, monitor, and manage your HPC clusters with a traditional command line interface or our beautiful, intuitive graphical user interface. OptimizetheUseofITResourcesbyHPCApplications–Bright ensures HPC applications get the resources they require according to the policies of your organization. Your cluster will prioritize workloads according to your business priorities. HPCToolsandLibrariesIncluded–EverycopyofBrightCluster Manager comes with a complete set of HPC tools and libraries so you will be ready to develop, debug, and deploy your HPC code right away. Advanced Cluster Management Made Easy Easy-to-use, complete and scalable “The building blocks for transtec HPC solutions must be chosen according to our goals ease-of-management and ease-of-use. With Bright Cluster Manager, we are happy to have the technology leader at hand, meeting these requirements, and our customers value that.” Jörg Walter HPC Solution Engineer Monitor your entire cluster at an glance with Bright Cluster Manager
  38. 38. 41 Quickly build and deploy an HPC cluster When you need to get an HPC cluster up and running, don’t waste time cobbling together a solution from various open source tools. Bright’s solution comes with everything you need to set up a complete cluster from bare metal and manage it with a beauti- ful, powerful user interface. Whether your cluster is on-premise or in the cloud, Bright is a virtual one-stop-shop for your HPC project. Build a cluster in the cloud Bright’s dynamic cloud provisioning capability lets you build an entire cluster in the cloud or expand your physical cluster into the cloud for extra capacity. Bright allocates the compute re- sources in Amazon Web Services automatically, and on demand. Build a hybrid HPC/Hadoop cluster Does your organization need HPC clusters for technical comput- ing and Hadoop clusters for Big Data? The Bright Cluster Man- ager for Apache Hadoop add-on enables you to easily build and manage both types of clusters from a single pane of glass. Your system administrators can monitor all cluster operations, man- age users, and repurpose servers with ease. Driven by research Bright has been building enterprise-grade cluster management software for more than a decade. Our solutions are deployed in thousands of locations around the globe. Why reinvent the wheel when you can rely on our experts to help you implement the ideal cluster management solution for your organization. Clusters in the cloud Bright Cluster Manager can provision and manage clusters that are running on virtual servers inside the AWS cloud as if they were local machines. Use this feature to build an entire cluster in AWS from scratch or extend a physical cluster into the cloud when you need extra capacity. Use native services for storage in AWS – Bright Cluster Manager lets you take advantage of inexpensive, secure, durable, flexible and simple storage services for data use, archiving, and backup in the AWS cloud. Make smart use of AWS for HPC – Bright Cluster Manager can save you money by instantiating compute resources in AWS only when they are needed. It uses built-in intelligence to create in- stances only after the data is ready for processing and the back- log in on-site workloads requires it. You decide which metrics to monitor. Just drag a component into the display area an Bright Cluster Manager instantly creates a graph of the data you need.
  39. 39. BOX 42 Advanced Cluster Management Made Easy Easy-to-use, complete and scalable High performance meets efficiency Initially, massively parallel systems constitute a challenge to both administrators and users. They are complex beasts. Anyone building HPC clusters will need to tame the beast, master the complexity and present users and administrators with an easy- to-use, easy-to-manage system landscape. Leading HPC solution providers such as transtec achieve this goal. They hide the complexity of HPC under the hood and match high performance with efficiency and ease-of-use for both users and administrators. The “P” in “HPC” gains a double meaning: “Performance” plus “Productivity”. Cluster and workload management software like Moab HPC Suite, Bright Cluster Manager or Intel Fabric Suite provide the means to master and hide the inherent complexity of HPC sys- tems. For administrators and users, HPC clusters are presented as single, large machines, with many different tuning parame- ters. The software also provides a unified view of existing clus- ters whenever unified management is added as a requirement by the customer at any point in time after the first installation. Thus, daily routine tasks such as job management, user management, queue partitioning and management, can be performed easily with either graphical or web-based tools, without any advanced scripting skills or technical expertise required from the adminis- trator or user.
  40. 40. S EW SE SW tech BOX 43 Bright Cluster Manager unleashes and manages the unlimited power of the cloud Create a complete cluster in Amazon EC2, or easily extend your onsite cluster into the cloud. Bright Cluster Manager provides a “single pane of glass” to both your on-premise and cloud resourc- es, enabling you to dynamically add capacity and manage cloud nodes as part of your onsite cluster. Cloud utilization can be achieved in just a few mouse clicks, with- out the need for expert knowledge of Linux or cloud computing. With Bright Cluster Manager, every cluster is cloud-ready, at no extra cost. The same powerful cluster provisioning, monitoring, scheduling and management capabilities that Bright Cluster Manager provides to onsite clusters extend into the cloud. The Bright advantage for cloud utilization Ease of use: Intuitive GUI virtually eliminates user learning curve; no need to understand Linux or EC2 to manage system. Alternatively, cluster management shell provides powerful scripting capabilities to automate tasks. Complete management solution: Installation/initialization, provisioning, monitoring, scheduling and management in one integrated environment. Integrated workload management: Wide selection of work- load managers included and automatically configured with local, cloud and mixed queues. Single pane of glass; complete visibility and control: Cloud computenodesmanagedaselementsoftheon-sitecluster;vi- sible from a single console with drill-downs and historic data. Efficient data management via data-aware scheduling: Auto- matically ensures data is in position at start of computation; delivers results back when complete. Secure, automatic gateway: Mirrored LDAP and DNS services inside Amazon VPC, connecting local and cloud-based nodes for secure communication over VPN. Cost savings: More efficient use of cloud resources; support for spot instances, minimal user intervention. Two cloud scenarios Bright Cluster Manager supports two cloud utilization scenarios: “Cluster-on-Demand” and “Cluster Extension”. Scenario 1: Cluster-on-Demand The Cluster-on-Demand scenario is ideal if you do not have a clus- ter onsite, or need to set up a totally separate cluster. With just a few mouse clicks you can instantly create a complete cluster in the public cloud, for any duration of time.
  41. 41. 44 Advanced Cluster Management Made Easy Cloud Bursting With Bright Scenario 2: Cluster Extension The Cluster Extension scenario is ideal if you have a cluster onsite but you need more compute power, including GPUs. With Bright Cluster Manager, you can instantly add EC2-based resources to your onsite cluster, for any duration of time. Extending into the cloud is as easy as adding nodes to an onsite cluster – there are only a few additional, one-time steps after providing the public cloud account information to Bright Cluster Manager. The Bright approach to managing and monitoring a cluster in the cloud provides complete uniformity, as cloud nodes are managed and monitored the same way as local nodes: Load-balanced provisioning Software image management Integrated workload management Interfaces – GUI, shell, user portal Monitoring and health checking Compilers, debuggers, MPI libraries, mathematical libraries and environment modules
  42. 42. Bright Computing 45 Bright Cluster Manager also provides additional features that are unique to the cloud. Amazon spot instance support – Bright Cluster Manager enables users to take advantage of the cost savings offered by Amazon’s spot instances. Users can specify the use of spot instances, and Bright will automatically schedule as available, reducing the cost to compute without the need to monitor spot prices and sched- ule manually. Hardware Virtual Machine (HVM) – Bright Cluster Manager auto- matically initializes all Amazon instance types, including Cluster Compute and Cluster GPU instances that rely on HVM virtualization. Amazon VPC – Bright supports Amazon VPC setups which allows compute nodes in EC2 to be placed in an isolated network, there- by separating them from the outside world. It is even possible to route part of a local corporate IP network to a VPC subnet in EC2, so that local nodes and nodes in EC2 can communicate easily. Data-aware scheduling – Data-aware scheduling ensures that input data is automatically transferred to the cloud and made accessible just prior to the job starting, and that the output data is automatically transferred back. There is no need to wait for the data to load prior to submitting jobs (delaying entry into the job queue), nor any risk of starting the job before the data transfer is complete (crashing the job). Users submit their jobs, and Bright’s data-aware scheduling does the rest. Bright Cluster Manager provides the choice: “To Cloud, or Not to Cloud” Not all workloads are suitable for the cloud. The ideal situation for most organizations is to have the ability to choose between onsite clusters for jobs that require low latency communication, complex I/O or sensitive data; and cloud clusters for many other types of jobs. Bright Cluster Manager delivers the best of both worlds: a pow- erful management solution for local and cloud clusters, with the ability to easily extend local clusters into the cloud without compromising provisioning, monitoring or managing the cloud resources. One or more cloud nodes can be configured under the Cloud Settings tab.
  43. 43. While all HPC systems face challenges in workload demand, workflow constraints, resource complexity, and system scale, enterprise HPC systems face more stringent challenges and expectations. Enterprise HPC systems must meet mission-critical and priority HPC workload demands for commercial businesses, business-oriented research, and academic organizations Intelligent HPC Workload Management 47 HighTrouhputComputingCADBigDataAnalyticsSimulationAerospaceAutomotive
  44. 44. 48 Solutions to Enterprise High-Performance Computing Challenges The Enterprise has requirements for dynamic scheduling, provi- sioning and management of multi-step/multi-application servic- es across HPC, cloud, and big data environments. Their workloads directly impact revenue, product delivery, and organizational ob- jectives. Enterprise HPC systems must process intensive simulation and data analysis more rapidly, accurately and cost-effectively to accelerate insights. By improving workflow and eliminating job delays and failures, the business can more efficiently leverage data to make data-driven decisions and achieve a competitive advantage. The Enterprise is also seeking to improve resource utilization and manage efficiency across multiple heterogene- ous systems. User productivity must increase to speed the time to discovery, making it easier to access and use HPC resources and expand to other resources as workloads demand. Enterprise-Ready HPC Workload Management Moab HPC Suite – Enterprise Edition accelerates insights by uni- fying data center resources, optimizing the analysis process, and guaranteeing services to the business. Moab 8.1 for HPC systems and HPC cloud continually meets enterprise priorities through increased productivity, automated workload uptime, and consistent SLAs. It uses the battle-tested and patented Moab intelligence engine to automatically balance the complex, mis- sion-critical workload priorities of enterprise HPC systems. En- terprise customers benefit from a single integrated product that brings together key Enterprise HPC capabilities, implementation services, and 24/7 support. It is imperative to automate workload workflows to speed time to discovery. The following new use cases play a key role in im- proving overall system performance: Intelligent HPC Workload Management Moab HPC Suite – Enterprise Edition Moab is the “brain” of an HPC system, intelligently optimizing workload throughput while balancing service levels and priorities.
  45. 45. S EW SE SW tech BOX 49 Intelligent HPC Workload Management Across Infrastructure and Organizational Complexity Moab HPC Suite – Enterprise Edition acts as the “brain” of a high performance computing (HPC) system to accelerate and auto- mate complex decision making processes. The patented Moab in- telligence engine provides multi-dimensional policies that mimic real-world decision making. These multi-dimensional, real-world policies are essential in scheduling workload to optimize job speed, job success and re- sourceutilization.MoabHPCSuite–EnterpriseEditionintegrates decision-making data from and automates actions through your system’s existing mix of resource managers. This enables all the dimensions of real-time granular resource attributes, resource state, as well as current and future resource commitments to be modeled against workload/application re- quirements. All of these dimensions are then factored into the most efficient scheduling decisions. Moab also dramatically sim- plifies the management tasks and processes across these com- plex, heterogeneous environments. Moab works with many of the major resource management and industry standard resource monitoring tools and covers mixed hardware, network, storage and licenses. Moab HPC Suite – Enterprise Edition multi-dimensional policies are also able to factor in organizational priorities and complex- ities when scheduling workload. Moab ensures workload is processed according to organizational priorities and commit- ments and that resources are shared fairly across users, groups and even multiple organizations. This enables organizations to automaticallyenforceserviceguaranteesandeffectivelymanage organizational complexities with simple policy-based settings. Moab HPC Suite – Enterprise Edition is architected to integrate on top of your existing resource managers and other types of management tools in your environment to provide policy-based scheduling and management of workloads. It makes the optimal complex decisions based on all of the data it integrates from the various resource managers and then orchestrates the job and management actions through those resource managers. Moab’s flexible management integration makes it the ideal choice to integrate with existing or new systems as well as to manage your high performance computing system as it grows and expands in the future.
  46. 46. 50 Elastic Computing – unifies data center resources by assisting the business management resource expansion through burst- ing to private/public clouds and other data center resources utilizing OpenStack as a common platform Performance and Scale – optimizes the analysis process by dramatically increasing TORQUE and Moab throughput and scalability across the board Tighter cooperation between Moab and Torque – harmoniz- ing these key structures to reduce overhead and improve communication between the HPC scheduler and the re- source manager(s) More Parallelization – increasing the decoupling of Moab and TORQUE’s network communication so Moab is less dependent on TORQUE’S responsiveness, resulting in a 2x speed improvement and shortening the duration of the av- erage scheduling iteration in half Accounting Improvements – allowing toggling between multiple modes of accounting with varying levels of en- forcement, providing greater flexibility beyond strict allo- cation options. Additionally, new up-to-date accounting balances provide real-time insights into usage tracking. Viewpoint Admin Portal – guarantees services to the busi- ness by allowing admins to simplify administrative reporting, workload status tracking, and job resource viewing Accelerate Insights Moab HPC Suite – Enterprise Edition accelerates insights by in- creasing overall system, user and administrator productivity, achieving more accurate results that are delivered faster and at a lower cost from HPC resources. Moab provides the scalability, 90-99 percent utilization, and simple job submission required to maximize productivity. This ultimately speeds the time to discov- ery, aiding the enterprise in achieving a competitive advantage due to its HPC system. Enterprise use cases and capabilities in- clude the following: Intelligent HPC Workload Management Moab HPC Suite – Enterprise Edition “With Moab HPC Suite, we can meet very demanding customers’ requirements as regards unified management of heterogeneous cluster environments, grid management, and provide them with flex- ible and powerful configuration and reporting options. Our custom- ers value that highly.” Thomas Gebert HPC Solution Architect
  47. 47. 51 OpenStack integration to offer virtual and physical resource provisioning for IaaS and PaaS Performance boost to achieve 3x improvement in overall opti- mization performance Advanced workflow data staging to enable improved cluster utilization, multiple transfer methods, and new transfer types Advanced power management with clock frequency control and additional power state options, reducing energy costs by 15-30 percent Workload-optimized allocation policies and provisioning to get more results out of existing heterogeneous resources and reduce costs, including topology-based allocation Unify workload management across heterogeneous clusters by managing them as one cluster, maximizing resource availa- bility and administration efficiency Optimized, intelligent scheduling packs workloads and back- fills around priority jobs and reservations while balancing SLAs to efficiently use all available resources Optimized scheduling and management of accelerators, both Intel Xeon Phi and GPGPUs, for jobs to maximize their utiliza- tion and effectiveness Simplified job submission and management with advanced job arrays and templates Showback or chargeback for pay-for-use so actual resource usage is tracked with flexible chargeback rates and reporting by user, department, cost center, or cluster Multi-cluster grid capabilities manage and share workload across multiple remote clusters to meet growing workload demand or surges Guarantee Services to the Business Job and resource failures in enterprise HPC systems lead to de- layed results, missed organizational opportunities, and missed objectives. Moab HPC Suite – Enterprise Edition intelligently automates workload and resource uptime in the HPC system to ensure that workloads complete successfully and reliably, avoid failures, and guarantee services are delivered to the business.
  48. 48. 52 The Enterprise benefits from these features: Intelligent resource placement prevents job failures with granular resource modeling, meeting workload requirements and avoiding at-risk resources Auto-response to failures and events with configurable ac- tions to pre-failure conditions, amber alerts, or other metrics and monitors Workload-aware future maintenance scheduling that helps maintain a stable HPC system without disrupting workload productivity Real-world expertise for fast time-to-value and system uptime with included implementation, training and 24/7 support re- mote services Auto SLA Enforcement Moab HPC Suite – Enterprise Edition uses the powerful Moab in- telligence engine to optimally schedule and dynamically adjust workload to consistently meet service level agreements (SLAs), guarantees, or business priorities. This automatically ensures that the right workloads are completed at the optimal times, taking into account the complex number of using departments, priorities, and SLAs to be balanced. Moab provides the following benefits: Usage accounting and budget enforcement that schedules resources and reports on usage in line with resource sharing agreements and precise budgets (includes usage limits, usage reports, auto budget management, and dynamic fair share policies) SLA and priority policies that make sure the highest priority workloads are processed first (includes Quality of Service and hierarchical priority weighting) Continuous plus future scheduling that ensures priorities and guarantees are proactively met as conditions and workload changes (i.e. future reservations, pre-emption) Intelligent HPC Workload Management Moab HPC Suite – Grid Option
  49. 49. 53 Grid- and Cloud-Ready Workload Management The benefits of a traditional HPC environment can be extended to more efficiently manage and meet workload demand with the multi-cluster grid and HPC cloud management capabilities in Moab HPC Suite – Enterprise Edition. It allows you to: Self-service job submission portal enables users to easily sub- mit and track HPC cloud jobs at any time from any location with little training required Pay-for-use show back or chargeback so actual resource usage is tracked with flexible chargeback rates reporting by user, de- partment, cost center, or cluster On-demand workload-optimized node provisioning dynami- cally provisions nodes based on policies, to better meet work- load needs. Manage and share workload across multiple remote clusters to meet growing workload demand or surges by adding on the Moab HPC Suite – Grid Option.
  50. 50. S EW SE SW tech BOX 54 Choose the Moab HPC Suite That’s Right for You This following chart provides a high-level comparison of the dif- ference in key use cases and capabilities that are enabled be- tween the different editions of Moab HPC Suite to intelligently manage and optimize HPC workload efficiency. Customers can use this chart to help determine which edition will best address their HPC workload management challenges, maximize their workload and resource productivity, and help them meet their service level performance goals. To explore any use cases or ca- pability compared in the chart further, see the HPC Workload Management Use Cases. Basic Edition is best for static, non-dynamic basic clusters of all sizes who want to optimize workload throughput and uti- lization while balancing priorities Enterprise Edition is best for large clusters with dynamic, mission-critical workloads and complex service levels and priorities to balance. It is also for organizations who want to integrate or who require HPC Cloud capabilities for their HPC cluster systems. Enterprise Edition provides additional uses cases these organizations need such as: Accounting for us- age of cluster resources for usage budgeting and pay-for-use showback or chargeback Workload optimized node provisioning Heterogeneous, multi-cluster workload management Workload-aware power management Intelligent HPC Workload Management Optimizing Accelerators with Moab HPC Suit
  51. 51. 55 Application Portal Edition is suited for customers who need an application-centric job submission experience to simplify and extend high performance computing to more users without re- quiring specialized HPC training. It provides all the benefits of the Enterprise Edition plus meeting these requirements with a technical computing application portal that supports most common applications in manufacturing, oil gas, life sciences, academic, government and other industries. Remote Visualization Edition is suited for customers with many users, internally or external to their organization, of 3D/CAE/CAD and other visualization applications who want to reduce the hardware, networking and management costs of running their visualization workloads. This edition also provides the added benefits of improved workforce produc- tivity and collaboration while optimizing the utilization and guaranteeing shared resource access for users moving from technical workstations to more efficient shared resources in a technical compute cloud. It provides all the benefits of the Enterprise Edition plus providing these additional capabili- ties and benefits.
  52. 52. BOX 56 transtec HPC solutions are designed for maximum flexibility and ease of management. We not only offer our customers the most powerful and flexible cluster management solution out there, but also provide them with customized setup and sitespecific configuration. Whether a customer needs a dynam- ical Linux-Windows dual-boot solution, unifi ed management of different clusters at different sites, or the fi ne-tuning of the Moab scheduler for implementing fi ne-grained policy confi guration – transtec not only gives you the framework at hand, but also helps you adapt the system according to your special needs. Needless to say, when customers are in need of special trainings, transtec will be there to provide customers, adminis- trators, or users with specially adjusted Educational Services. Having a many years’ experience in High Performance Comput- ing enabled us to develop effi cient concepts for installing and deploying HPC clusters. For this, we leverage well-proven 3rd party tools, which we assemble to a total solution and adapt and confi gure according to the customer’s requirements. We manage everything from the complete installation and configuration of the operating system, necessary middleware components like job and cluster management systems up to the customer’s applications. Intelligent HPC Workload Management Optimizing Accelerators with Moab HPC Suit
  53. 53. 57 Moab HPC Suite Use Case Basic Enterprise Application Portal Remote Visualization ProductivityAcceleration Massive multifaceted scalability x x x x Workload optimized resource allocation x x x x Optimized, intelligent scheduling of work- load and resources x x x x Optimized accelerator scheduling manage- ment (Intel MIC GPGPU) x x x x Simplified user job submission mgmt. - Application Portal x x x x x Administrator dashboard and management reporting x x x x Unified workload management across het- erogeneous cluster(s) x Single cluster only x x x Workload optimized node provisioning x x x Visualization Workload Optimization x Workload-aware auto-power management x x x Uptime Automation Intelligent resource placement for failure prevention x x x x Workload-aware maintenance scheduling x x x x Basic deployment, training and premium (24x7) support service Standard support only x x x Auto response to failures and events x Limited no re-boot/provision x x x Auto SLA enforcement Resource sharing and usage policies x x x x SLA and oriority policies x x x x Continuous and future reservations scheduling x x x x Usage accounting and budget enforcement x x x Grid- and Cloud-Ready Manage and share workload across multiple clusters in wich area grid x w/grid Option x w/grid Option x w/grid Option x w/grid Option User self-service: job request management portal x x x x Pay-for-use showback and chargeback x x x Workload optimized node provisioning re-purposing x x x Multiple clusters in single domain
  54. 54. 58 Intelligent HPC Workload Management Moab HPC Suite – Application Portal Edition Moab HPC Suite – Application Portal Edition Moab HPC Suite – Application Portal Edition provides easy sin- gle-pointaccesstocommontechnicalapplications,data,resourc- es,andjobsubmissionstoaccelerateprojectcompletionandcolla- borationforenduserswhilereducingcomputingcosts.Itenables more of your internal staff and partners to collaborate from any- where using HPC applications and resources without any special- ized HPC training while your intellectual property stays secure. The optimization policies Moab provides significantly reduces the costs and complexity of keeping up with compute demand by optimizing resource sharing, utilization and higher through- put while still balancing different service levels and priorities. Remove the Barriers to Harnessing HPC Organizations of all sizes and types are looking to harness the potential of high performance computing (HPC) to enable and accelerate more competitive product design and research. To do this, you must find a way to extend the power of HPC resources to designers, researchers and engineers not familiar with or trained on using the specialized HPC technologies. Simplifying the user access to and interaction with applications running on more efficient HPC clusters removes the barriers to discovery and in- novation for all types of departments and organizations. Moab HPC Suite – Application Portal Edition provides easy single-point access to common technical applications, data, HPC resources, and job submissions to accelerate research analysis and collab- oration process for end users while reducing computing costs. HPC Ease of Use Drives Productivity Moab HPC Suite – Application Portal Edition improves ease of use and productivity for end users using HPC resources with sin- gle-point access to their common technical applications, data, resources, and job submissions across the design and research process. It simplifies the specification and running of jobs on HPC resources to a few simple clicks on an application specific tem- plate with key capabilities including:

×