HPC Compass 2016_17

HIGH PERFORMANCE
COMPUTING
Technology Compass 2016/17
& BIG DATA

N
E
NE
NWMore Than 35 Years of Experience
in Scientific Computing

1980markedthebeginningofadecadewherenumerousstartups
were created, some of which later transformed into big players in
the IT market. Technical innovations brought dramatic changes
to the nascent computer market. In Tübingen, close to one of
Germany’s prime and oldest universities, transtec was founded.
In the early days, transtec focused on reselling DEC computers
and peripherals, delivering high-performance workstations to
university institutes and research facilities. In 1987, SUN/Sparc
and storage solutions broadened the portfolio, enhanced by IBM/
RS6000 products in 1991. These were the typical workstations
and server systems for high performance computing then, used
by the majority of researchers worldwide. Meanwhile, transtec
is the biggest European-wide and Germany-based provider of
comprehensive HPC solutions. Many of our HPC clusters entered
the TOP 500 list of the world’s fastest computing systems.
Thus, given this background and history, it is fair to say that tran-
stec looks back upon a more than 35 years’ experience in scientif-
ic computing; our track record shows more than 1,000 HPC instal-
lations. With this experience, we know exactly what customers’
demands are and how to meet them. High performance and ease
of management – this is what customers require today. HPC sys-
tems are for sure required to peak-perform, as their name indi-
cates, but that is not enough: they must also be easy to handle.
Unwieldy design and operational complexity must be avoided or
at least hidden from administrators and particularly users of HPC
computer systems.
High Performance Computing has changed in the course of the
last 10 years. Whereas traditional topics like simulation, large-
SMP systems or MPI parallel computing still are important cor-
nerstones of HPC, new areas of applicability like Business Intelli-
gence/Analytics, new technologies like in-memory or distributed
databases or new algorithms like MapReduce have entered the
arena which is now commonly called HPC & Big Data.
Knowledge about Hadoop or R nowadays belongs to the basic
knowledge of any serious HPC player.
Dynamical deployment scenarios are becoming more and more
important for realizing private (or even public) HPC cloud sce-
narios. The OpenStack ecosystem, object storage concepts like
Ceph, application containers with Docker – all these have been
bleeding-edge technologies once, but have become state-of-the-
art meanwhile.
Therefore, this brochure is now called “Technology Compass
HPC & Big Data”. It covers all of the above-mentioned currently
discussed core technologies. Besides those, more familiar tradi-
tional updates like the Intel Omni-Path architecture are covered
as well, of course. No matter which ingredient an HPC solution
requires – transtec masterly combines excellent and well-cho-
sen components that are already there to a fine-tuned, custom-
er-specific, and thoroughly designed HPC solution.
Your decision for a transtec HPC solution means you opt for most
intensive customer care and best service in HPC. Our experts will
be glad to bring in their expertise and support to assist you at
any stage, from HPC design to daily cluster operations, to HPC
Cloud Services.
Last but not least, transtec HPC Cloud Services provide custom-
ers with the possibility to have their jobs run on dynamically pro-
vided nodes in a dedicated datacenter, professionally managed
and individually customizable. Numerous standard applications
like ANSYS, LS-Dyna, OpenFOAM, as well as lots of codes like
Gromacs, NAMD, VMD, and others are pre-installed, integrated
into an enterprise-ready cloud management environment, and
ready to run.
Have fun reading the
transtec HPC Compass 2016/17!

High Performance Computing
for Everyone
Interview with Dr. Oliver Tennert
Director Technology Management
and HPC Solutions

Dr. Tennert, what’s the news in HPC?
Besides becoming more and more commodity, the HPC
market has seen two major changes within the last couple of
years: on the one hand side HPC has reached over to adjacent
areas that would not have been recognized as part of HPC 5
years ago. Meanwhile however every serious HPC company
tries to find their positioning within the area of “big data”,
often vaguely understood, but sometimes put more precisely
as “big data analytics”.
Superficially, it’s all about analyzing large amounts of data as
has been part of HPC like ever before. But the areas of appli-
cation are becoming more: “big data analytics” is a vital ap-
plication of business intelligence/business analytics. Along
with this new methods, new concepts, and new technologies
emerge. The other change has to do with the way HPC environ-
ments are going to be deployed and managed. Dynamic provi-
sioning of resources for private-cloud scenarios are becoming
more and more important, to which OpenStack has contrib-
uted a lot. In the near future, dynamic deployment will be as
commodity as efficient, but static deployment today.
What’s the deal with “dynamic deployment” anyway?
Dynamic deployment is the core provisioning technology in
cloud environments, public or private ones. Virtual machines
with different OSses, web servers, compute nodes, or any kind
of resources are only deployed when they are needed and
requested.
Dynamic deployment makes sure that the available hardware
capacity will be prepared and deployed for exactly the pur-
pose for which it is needed. The advantage is: less idle capacity
and better utilization than could happen with static environ-
ments where e.g. inactive Windows servers block hardware
capacity whereas Linux-based compute nodes would be
needed at a certain point in time.
Everyone is discussing OpenStack at the moment. What do you
think of the hype and the development of this platform? And
what is its importance for transtec?
OpenStack has indeed been discussed at a time when it still
had little business relevance for an HPC solutions provider.
But this is the case with all innovations, be they hardware- or
software-related.
But the fact that OpenStack is a joint development effort of
several big IT players, had added to the demand for private
clouds in the market on the one hand side, and to its relevance
as the software solution for that on the other hand side.
We at transtec have several customers who either ask us to
deploy a completely new OpenStack environment, or would
like us to migrate their existing one to OpenStack.
OpenStack has the reputation of being a very complex beast,
given the many components. Implementing it can take a while.
Is there any way to simplify this?
Indeed life can be rough if you do it wrong. But there is no need
to: no one goes ahead and installs Linux from scratch nowa-
days, apart from playing around and learning: instead every-
one uses professional distributions with commercial support.
There are special OpenStack distributions too, with commer-
cial support: especially within HPC, Bright Cluster Manager
is often the distribution of choice, but also long-term players
like Red Hat offer simple deployment frameworks.
Our customers can rest assured that they get an individually
optimized OpenStack environment, which is not only de-
signed by us, but also installed and maintained throughout.
It has always been our job to hide any complexity underneath
a layer of “ease of use, ease of management”.

6
Who do you want to address with this topic?
Everyone who is interested in a highly flexible, but also effec-
tive utilization of their existing hardware capacity. We address
customers from academia, as well as SMB customers who
have to turn towards new concepts and new technologies,
like big enterprises.
As said before: in our opinion, this way of resource provision-
ing will become mainstream soon.
Let’s turn to the other big topic: What is the connection between
HPC and Big Data Analytics, and what does the market have to
say about this?
As already mentioned, the HPC market is reaching out towards
big data analytics. “HPC & Big Data” is named in one breath at
various key events like ISC or Supercomputing.
Thisisnotwithoutreasonofcourse:asdifferentasthetechnol-
ogies in use or the goals to reach may be, there is one common
denominator: and this is a high demand for computational
capacity. An HPC solution provider like transtec, who is ca-
pable of designing and deploying a one-petaflop HPC cluster,
naturally also manages the provisioning of say a ten-petabyte
Hadoop cluster for big data analytics to a customer.
But the target group may be a different one: classical HPC
mostly takes place within the value chain of the customer. The
users are developers. Important areas for big data analytics,
however, can be found in business controlling, i.e. BI/BA, but
also in companies’ operations departments, when operation-
al decisions have to be made depending on certain vital state
data, in real time. This is what “internet of things” is all about.
for Everyone
Interview with Dr. Oliver Tennert
Director Technology Management
and HPC Solutions

7
Where are the technical differences between HPC and Big Data
Analytics, and what do both have in common?
Both areas share the “scale-out” approach. It is not the single
server that gets more and more powerful, but rather accelera-
tion is reached via parallelization and massive load balancing
on several servers.
In HPC, parallel computational jobs that run using the MPI
programming interface are state of the art. Simulations run on
many compute nodes at once, thus cutting down wall-clock
time. Within the context of big data analytics, distributed and
above all in-memory databases are becoming more and more
important. Database requests can thus be handled by several
servers, and with a low latency.
How do companies handle the backup and the archiving of
these huge amounts of data?
Given this enormous amount of data, standard backup strat-
egies or archiving concepts do indeed not apply anymore. In-
stead, for big data analytics, data are stored in a redundant
way in principle, to become independent from hardware fail-
ures. Storage concepts like Hadoop or object storage like Ceph
offer redundancy in data storage as well.
Moreover, when it comes to big data analytics, the basic prem-
ise is that it really does not matter much when a tiny fraction
out of these huge amounts of data go missing for some rea-
son. The goal of big data analytics is not to get a precise ana-
lytical result out of these often unstructured data, but rather
the quickest possible extraction of relevant information and
a sensible interpretation, whereby statistical methods are of
great importance.
For which companies is it important to do this very quick extrac-
tion of information?
Holiday booking portals that use booking systems have to get
an immediate overview of the hotel rooms that are available
in some random town, sorted by price, in order to provide cus-
tomers with this information. Energy or finance companies
who do exchange tradings of their assets, have to know their
current portfolio’s value in the fraction of a second to do the
right pricing.
Search engine providers – there are more than just Google –
investigative authorities, credit card clearance houses who
have to react quickly to attempted frauds: the list of potential
companies who could make use of big data analytics grows
alongside with the technical progress.

8
High Performance Computing.............................................. 10
Performance Turns Into Productivity..........................................................12
Flexible Deployment With xCAT......................................................................14
Service and Customer Care From A to Z ....................................................16
OpenStack for HPC Cloud Environments....................... 20
What is OpenStack?................................................................................................22
Object Storage with Ceph...................................................................................24
Application Containers with Docker............................................................30
Advanced Cluster Management Made Easy................. 38
Easy-to-use, Complete and Scalable.............................................................40
Cloud Bursting With Bright................................................................................48
Intelligent HPC Workload Management......................... 50
Moab HPC Suite.........................................................................................................52
IBM Platform LSF......................................................................................................56
Slurm................................................................................................................................58
IBM Platform LSF 8...................................................................................................61
Intel Cluster Ready....................................................................... 64
A Quality Standard for HPC Clusters............................................................66
Intel Cluster Ready Builds HPC Momentum............................................70
The transtec Benchmarking Center..............................................................74
Remote Visualization and
Workflow Optimization............................................................. 76
NICE EnginFrame: A Technical Computing Portal................................78
Remote Visualization.............................................................................................82
Desktop Cloud Visualization.............................................................................84
Cloud Computing With transtec and NICE...............................................86
NVIDIA Grid Boards..................................................................................................88
Virtual GPU Technology........................................................................................90
Technology Compass
Table of Contents and Introduction

9
Big Data Analytics with Hadoop.......................................... 92
Apache Hadoop.........................................................................................................94
HDFS Architecture...................................................................................................98
NVIDIA GPU Computing –
The CUDA Architecture............................................................108
What is GPU Computing?................................................................................. 110
Kepler GK110/210 GPU Computing Architecture............................... 112
A Quick Refresher on CUDA............................................................................. 122
Intel Xeon Phi Coprocessor...................................................124
The Architecture.................................................................................................... 126
An Outlook on Knights Landing and Omni Path................................ 128
Vector Supercomputing..........................................................138
The NEC SX Architecture................................................................................... 140
History of the SX Series..................................................................................... 146
Parallel File Systems –
The New Standard for HPC Storage.................................154
Parallel NFS............................................................................................................... 156
What’s New in NFS 4.1?...................................................................................... 158
Panasas HPC Storage.......................................................................................... 160
BeeGFS......................................................................................................................... 172
General Parallel File System (GPFS)............................................................ 176
What’s new in GPFS Version 3.5.................................................................... 190
Lustre............................................................................................................................ 192
Intel Omni-Path Interconnect..............................................196
The Architecture.................................................................................................... 198
Host Fabric Interface (HFI)............................................................................... 200
Architecture Switch Fabric Optimizations............................................ 202
Fabric Software Components........................................................................ 204
Numascale.......................................................................................206
NumaConnect Background............................................................................. 208
Technology................................................................................................................ 210
Redefining Scalable OpenMP and MPI..................................................... 216
Big Data....................................................................................................................... 220
Cache Coherence................................................................................................... 222
NUMA and ccNUMA.............................................................................................. 224
Allinea Forge for parallel
Software Development...........................................................226
Application Performance and Development Tools.......................... 228
What’s in Allinea Forge?.................................................................................... 230
Glossary.............................................................................................242

11
High Performance
Computing – Performance
Turns Into Productivity
High Performance Computing (HPC) has been with us from the very beginning of the
computer era. High-performance computers were built to solve numerous problems
which the “human computers” could not handle.
The term HPC just hadn’t been coined yet. More important, some of the early principles
have changed fundamentally.
Engineering | Life Sciences | Automotive | Price Modelling | Aerospace | CAE | Data Analytics

12
Performance Turns Into Productivity
HPC systems in the early days were much different from those
we see today. First, we saw enormous mainframes from large
computer manufacturers, including a proprietary operating
system and job management system. Second, at universities
and research institutes, workstations made inroads and scien-
tists carried out calculations on their dedicated Unix or VMS
workstations. In either case, if you needed more computing
power, you scaled up, i.e. you bought a bigger machine. Today
the term High-Performance Computing has gained a fundamen-
tally new meaning. HPC is now perceived as a way to tackle
complex mathematical, scientific or engineering problems. The
integration of industry standard, “off-the-shelf” server hardware
into HPC clusters facilitates the construction of computer net-
works of such power that one single system could never achieve.
The new paradigm for parallelization is scaling out.
Computer-supported simulations of realistic processes (so-called
Computer Aided Engineering – CAE) has established itself as a
thirdkeypillarinthefieldofscienceandresearchalongsidetheo-
ry and experimentation. It is nowadays inconceivable that an air-
craft manufacturer or a Formula One racing team would operate
without using simulation software. And scientific calculations,
such as in the fields of astrophysics, medicine, pharmaceuticals
and bio-informatics, will to a large extent be dependent on super-
computers in the future. Software manufacturers long ago rec-
ognized the benefit of high-performance computers based on
powerful standard servers and ported their programs to them
accordingly.
The main advantages of scale-out supercomputers is just that:
they are infinitely scalable, at least in principle. Since they are
based on standard hardware components, such a supercomput-
er can be charged with more power whenever the computational
capacity of the system is not sufficient any more, simply by add-
ing additional nodes of the same kind. A cumbersome switch to a
different technology can be avoided in most cases.
“transtec HPC solutions are meant to provide customers with
unparalleled ease-of-management and ease-of-use. Apart from
that, deciding for a transtec HPC solution means deciding for the
most intensive customer care and the best service imaginable.”
Dr. Oliver Tennert | Director Technology Management & HPC Solutions
Performance Turns Into Productivity

S
EW
SE
SW tech BOX
13
Variations on the theme: MPP and SMP
Parallel computations exist in two major variants today. Ap-
plications running in parallel on multiple compute nodes are
frequently so-called Massively Parallel Processing (MPP) appli-
cations. MPP indicates that the individual processes can each
utilize exclusive memory areas. This means that such jobs are
predestined to be computed in parallel, distributed across the
nodes in a cluster. The individual processes can thus utilize the
separate units of the respective node – especially the RAM, the
CPU power and the disk I/O.
Communication between the individual processes is implement-
ed in a standardized way through the MPI software interface
(Message Passing Interface), which abstracts the underlying
network connections between the nodes from the processes.
However, the MPI standard (current version 2.0) merely requires
source code compatibility, not binary compatibility, so an off-the-
shelf application usually needs specific versions of MPI libraries
in order to run. Examples of MPI implementations are OpenMPI,
MPICH2, MVAPICH2, Intel MPI or – for Windows clusters – MS-MPI.
If the individual processes engage in a large amount of commu-
nication, the response time of the network (latency) becomes im-
portant. Latency in a Gigabit Ethernet or a 10GE network is typi-
cally around 10 µs. High-speed interconnects such as InfiniBand,
reduce latency by a factor of 10 down to as low as 1 µs. Therefore,
high-speed interconnects can greatly speed up total processing.
The other frequently used variant is called SMP applications.
SMP, in this HPC context, stands for Shared Memory Processing.
It involves the use of shared memory areas, the specific imple-
mentation of which is dependent on the choice of the underlying
operating system. Consequently, SMP jobs generally only run on
a single node, where they can in turn be multi-threaded and thus
be parallelized across the number of CPUs per node. For many
HPC applications, both the MPP and SMP variant can be chosen.
Many applications are not inherently suitable for parallel execu-
tion. In such a case, there is no communication between the in-
dividual compute nodes, and therefore no need for a high-speed
network between them; nevertheless, multiple computing jobs
can be run simultaneously and sequentially on each individual
node, depending on the number of CPUs.
In order to ensure optimum computing performance for these
applications, it must be examined how many CPUs and cores de-
liver the optimum performance.
We find applications of this sequential type of work typically in
the fields of data analysis or Monte-Carlo simulations.

BOX
14
xCAT as a Powerful and Flexible Deployment Tool
xCAT (Extreme Cluster Administration Tool) is an open source
toolkit for the deployment and low-level administration of HPC
cluster environments, small as well as large ones.
xCAT provides simple commands for hardware control, node
discovery, the collection of MAC addresses, and the node deploy-
ment with (diskful) or without local (diskless) installation. The
cluster configuration is stored in a relational database. Node
groups for different operating system images can be defined.
Also, user-specific scripts can be executed automatically at in-
stallation time.
xCAT Provides the Following Low-Level Administrative
Features
Remote console support
Parallel remote shell and remote copy commands
Plugins for various monitoring tools like Ganglia or Nagios
Hardware control commands for node discovery, collecting
MAC addresses, remote power switching and resetting of
nodes
Automatic configuration of syslog, remote shell, DNS, DHCP,
and ntp within the cluster
Extensive documentation and man pages
For cluster monitoring, we install and configure the open source
tool Ganglia or the even more powerful open source solution Na-
gios, according to the customer’s preferences and requirements.
Flexible Deployment With xCAT

15
Local Installation or Diskless Installation
We offer a diskful or a diskless installation of the cluster nodes.
A diskless installation means the operating system is hosted par-
tially within the main memory, larger parts may or may not be
included via NFS or other means. This approach allows for de-
ploying large amounts of nodes very efficiently, and the cluster
is up and running within a very small timescale. Also, updating
the cluster can be done in a very efficient way. For this, only the
boot image has to be updated, and the nodes have to be reboot-
ed. After this, the nodes run either a new kernel or even a new
operating system. Moreover, with this approach, partitioning the
cluster can also be very efficiently done, either for testing pur-
poses, or for allocating different cluster partitions for different
users or applications.
Development Tools, Middleware, and Applications
According to the application, optimization strategy, or underly-
ing architecture, different compilers lead to code results of very
different performance. Moreover, different, mainly commercial,
applications, require different MPI implementations. And even
when the code is self-developed, developers often prefer one MPI
implementation over another.
According to the customer’s wishes, we install various compilers,
MPI middleware, as well as job management systems like Para-
station, Grid Engine, Torque/Maui, or the very powerful Moab
HPC Suite for the high-level cluster management.

BOX
16
transtec HPC as a Service
You will get a range of applications like LS-Dyna, ANSYS, Gromacs,
NAMD etc. from all kinds of areas pre-installed, integrated into an
enterprise-ready cloud and workload management system, and
ready-to run. Do you miss your application?
Ask us: HPC@transtec.de
transtec Platform as a Service
You will be provided with dynamically provided compute nodes
for running your individual code. The operating system will be
pre-installed according to your requirements. Common Linux dis-
tributions like RedHat, CentOS, or SLES are the standard. Do you
need another distribution?
Ask us: HPC@transtec.de
transtec Hosting as a Service
You will be provided with hosting space inside a professionally
managed and secured datacenter where you can have your ma-
chines hosted, managed, maintained, according to your require-
ments. Thus, you can build up your own private cloud. What
range of hosting and maintenance services do you need?
Tell us: HPC@transtec.de
HPC @ transtec:
Services and Customer Care from A to Z
transtec AG has over 30 years of experience in scientific computing
andisoneoftheearliestmanufacturersofHPCclusters.Fornearlya
decade,transtechasdeliveredhighlycustomizedHighPerformance
clusters based on standard components to academic and industry
customersacrossEuropewithallthehighqualitystandardsandthe
customer-centric approach that transtec is well known for. Every
transtec HPC solution is more than just a rack full of hardware – it is
a comprehensive solution with everything the HPC user, owner, and
operator need. In the early stages of any customer’s HPC project,
transtec experts provide extensive and detailed consulting to the
customer – they benefit from expertise and experience. Consulting
is followed by benchmarking of different systems with either spe-
cificallycraftedcustomercodeorgenerallyacceptedbenchmarking
routines; this aids customers in sizing and devising the optimal and
detailed HPC configuration. Each and every piece of HPC hardware
thatleavesourfactoryundergoesaburn-inprocedureof24hoursor
customer
training
individual Presales
consulting
application-,
customer-,
site-speciﬁc
sizing of
HPC solution
burn-in tests
of systems
benchmarking of
different systems
continual
improvement
software
OS
installation
application
installation
onsite
hardware
assembly
integration
into
customer’s
environment
maintenance,
support
managed services
Services and Customer Care from A to Z
Services and Customer
Care from A to Z

17
more if necessary. We make sure that any hardware shipped meets
ourandourcustomers’qualityrequirements.transtecHPCsolutions
are turnkey solutions. By default, a transtec HPC cluster has every-
thing installed and configured – from hardware and operating sys-
temtoimportantmiddlewarecomponentslikeclustermanagement
or developer tools and the customer’s production applications.
Onsite delivery means onsite integration into the customer’s pro-
ductionenvironment,beitestablishingnetworkconnectivitytothe
corporate network, or setting up software and configuration parts.
transtec HPC clusters are ready-to-run systems – we deliver, you
turn the key, the system delivers high performance. Every HPC proj-
ect entails transfer to production: IT operation processes and poli-
ciesapplytothenewHPCsystem.Effectively,ITpersonnelistrained
hands-on, introduced to hardware components and software, with
all operational aspects of configuration management.
transtec services do not stop when the implementation projects
ends. Beyond transfer to production, transtec takes care. transtec
offers a variety of support and service options, tailored to the cus-
tomer’s needs. When you are in need of a new installation, a major
reconfiguration or an update of your solution – transtec is able to
support your staff and, if you lack the resources for maintaining the
cluster yourself, maintain the HPC solution for you.
From Professional Services to Managed Services for daily opera-
tionsandrequiredservicelevels,transtecwillbeyourcompleteHPC
service and solution provider. transtec’s high standards of perfor-
mance, reliability and dependability assure your productivity and
completesatisfaction.transtec’sofferingsofHPCManagedServices
offercustomersthepossibilityofhavingthecompletemanagement
and administration of the HPC cluster managed by transtec service
specialists,inanITILcompliantway.Moreover,transtec’sHPConDe-
mand services help provide access to HPC resources whenever they
need them, for example, because they do not have the possibility
of owning and running an HPC cluster themselves, due to lacking
infrastructure, know-how, or admin staff.
transtec HPC Cloud Services
Last but not least transtec’s services portfolio evolves as custom-
ers‘ demands change. Starting this year, transtec is able to provide
HPCCloudServices.transtecusesadedicateddatacentertoprovide
computing power to customers who are in need of more capacity
than they own, which is why this workflow model is sometimes
called computing-on-demand. With these dynamically provided
resources, customers with the possibility to have their jobs run on
HPC nodes in a dedicated datacenter, professionally managed and
secured, and individually customizable. Numerous standard ap-
plications like ANSYS, LS-Dyna, OpenFOAM, as well as lots of codes
like Gromacs, NAMD, VMD, and others are pre-installed, integrated
into an enterprise-ready cloud and workload management environ-
ment, and ready to run.
Alternatively, whenever customers are in need of space for host-
ing their own HPC equipment because they do not have the space
capacity or cooling and power infrastructure themselves, transtec
is also able to provide Hosting Services to those customers who’d
liketohavetheirequipmentprofessionallyhosted,maintained,and
managed. Customers can thus build up their own private cloud!
Are you interested in any of transtec’s broad range of HPC related
services? Write us an email to HPC@transtec.de. We’ll be happy to
hear from you!

18
Services and Customer
Care from A to Z
Are you an end user?
Then doing your work as productive as possible is of utmost im-
portance to you! You are interested in an IT environment running
smoothly, with a fully functional infrastructure, in competent and
professional support by internal administrators and external ser-
vice providers, and a service desk with guaranteed availability at
working times. HPC cloud services ensure you are able to run your
simulation jobs even if computational requirements exceed the lo-
cal capacities.
support services
professional services
managed services
cloud services
Are you an administrator?
Then you are responsible for providing a well-performing IT infra-
structure, for supporting the end users when problems occur. In
situations where problem management fully exploits your capacity,
or in cases where you just don’t have time or maybe lack experience
in performing certain tasks, you might be glad to be assisted by spe-
cialized IT experts.
support services
Are you an R D manager?
Then your job is to ensure that developers, researchers, and other
endusersareabletodotheirjob.Tohaveapartnerathandthatpro-
vides you with a full range of operations support, either in a com-
pletely flexible, on-demand base, or in a fully-managed way based
on ITIL best practices, is heartily welcome for you.
managed services

19
Services Continuum
Support
Services
Paid to
Support
Installation
Integration
Migration
Upgrades
Conﬁguration
Changes
Professional
Services
Paid to
Do
Service Desk
Event Monitoring
and Notiﬁcation
Release
Management
Capacity
Management
Managed
Services
Paid to
Operate
Hosting
Infrastructure
(IaaS)
HPC
(HPCaaS)
Platform
(PaaS)
Cloud
Services
Paid to
Use
Performance
Analysis
Technology
Assessment
Infrastructure
Design
PLAN BUILD RUN
Consulting
Services
Paid to
Advise
Are you a CIO?
Then your main concern is a smoothly-running IT operations envi-
ronment with a maximum of efficiency and effectivity. Managed
Services to ensure business continuity are a main part of your con-
siderations regarding business objectives. HPC cloud services are
a vital part of your capacity plannings to ensure the availability of
computational resources to your internal clients, the R D depart-
ments.
managed services
cloud services
Are you a CEO?
Profitability,time-to-result,businesscontinuity,andensuringfuture
viability are topics that you continuously have to think about. You
are happy to have a partner at hand, who gives you insight into fu-
turedevelopments,providesyouwithtechnologyconsulting,andis
able to help you ensure business continuity and fall-back scenarios,
as well as scalability readiness, by means of IT capacity outsourcing
solutioons.
cloud services
consulting services

Life Sciences | CAE | High Performance Computing | Big Data Analytics | Simulation | CAD
OpenStack for HPC
Cloud Environments
OpenStack is a set of software tools for building and managing cloud computing
platforms for public and private clouds. Backed by some of the biggest companies in
software development and hosting, as well as thousands of individual community
members, many think that OpenStack is the future of cloud computing.
OpenStack is managed by the OpenStack Foundation, a non-profit organization which
oversees both development and community-building around the project.

22
What is OpenStack?
Introduction to OpenStack
OpenStackletsusersdeployvirtualmachinesandotherinstances
which handle different tasks for managing a cloud environment
on the fly. It makes horizontal scaling easy, which means that
tasks which benefit from running concurrently can easily serve
more or less users on the fly by just spinning up more instances.
For example, a mobile application which needs to communicate
with a remote server might be able to divide the work of commu-
nicating with each user across many different instances, all
communicating with one another but scaling quickly and easily
as the application gains more users.
And most importantly, OpenStack is open source software, which
means that anyone who chooses to can access the source code,
make any changes or modifications they need, and freely share
these changes back out to the community at large. It also means
that OpenStack has the benefit of thousands of developers all
over the world working in tandem to develop the strongest, most
robust, and most secure product that they can.
How is OpenStack used in a cloud environment?
The cloud is all about providing computing for end users in a
remote environment, where the actual software runs as a service
on reliable and scalable servers rather than on each end users
computer. Cloud computing can refer to a lot of different things,
but typically the industry talks about running different items “as
a service” – software, platforms, and infrastructure. OpenStack
falls into the latter category and is considered Infrastructure as
a Service (IaaS). Providing infrastructure means that OpenStack
makes it easy for users to quickly add new instance, upon which
other cloud components can run. Typically, the infrastructure
then runs a “platform” upon which a developer can create soft-
ware applications which are delivered to the end users.
OpenStack for HPC
Cloud Environments
What is OpenStack?

23
What are the components of OpenStack?
OpenStack is made up of many different moving parts. Because
of its open nature, anyone can add additional components to
OpenStack to help it to meet their needs. But the OpenStack
community has collaboratively identified nine key components
that are a part of the “core” of OpenStack, which are distributed
as a part of any OpenStack system and officially maintained by
the OpenStack community.
Nova is the primary computing engine behind OpenStack. It
is used for deploying and managing large numbers of virtual
machines and other instances to handle computing tasks.
Swift is a storage system for objects and files. Rather than
the traditional idea of a referring to files by their location on
a disk drive, developers can instead refer to a unique identi-
fier referring to the file or piece of information and let Open-
Stack decide where to store this information. This makes
scaling easy, as developers don’t have the worry about the
capacity on a single system behind the software. It also
allows the system, rather than the developer, to worry about
how best to make sure that data is backed up in case of the
failure of a machine or network connection.
Cinder is a block storage component, which is more analo-
gous to the traditional notion of a computer being able to
access specific locations on a disk drive. This more tradi-
tional way of accessing files might be important in scenarios
in which data access speed is the most important consider-
ation.
Neutron provides the networking capability for OpenStack.
It helps to ensure that each of the components of an Open-
Stack deployment can communicate with one another
quickly and efficiently.
Horizon is the dashboard behind OpenStack. It is the only
graphical interface to OpenStack, so for users wanting to
give OpenStack a try, this may be the first component they
actually “see.” Developers can access all of the components
of OpenStack individually through an application program-
ming interface (API), but the dashboard provides system
administrators a look at what is going on in the cloud, and to
manage it as needed.
Keystone provides identity services for OpenStack. It is
essentially a central list of all of the users of the OpenStack
cloud, mapped against all of the services provided by the
cloudwhichtheyhavepermissiontouse.Itprovidesmultiple
means of access, meaning developers can easily map their
existing user access methods against Keystone.
Glance provides image services to OpenStack. In this case,
“images” refers to images (or virtual copies) of hard disks.
Glance allows these images to be used as templates when
deploying new virtual machine instances.
Ceilometer provides telemetry services, which allow the
cloud to provide billing services to individual users of the
cloud. It also keeps a verifiable count of each user’s system
usage of each of the various components of an OpenStack
cloud. Think metering and usage reporting.
Heat is the orchestration component of OpenStack, which
allows developers to store the requirements of a cloud appli-
cation in a file that defines what resources are necessary for
that application. In this way, it helps to manage the infra-
structure needed for a cloud service to run.

24
OpenStack for HPC
Cloud Environments
Object Storage with Ceph
Why “Ceph”?
“Ceph” is an odd name for a file system and breaks the typical
acronym trend that most follow. The name is a reference to the
mascot at UCSC (Ceph’s origin), which happens to be “Sammy,”
the banana slug, a shell-less mollusk in the cephalopods class.
Cephalopods, with their multiple tentacles, provide a great meta-
phor for a distributed file system.
Ceph goals
Developing a distributed file system is a complex endeavor, but
it’s immensely valuable if the right problems are solved. Ceph’s
goals can be simply defined as:
Easy scalability to multi-petabyte capacity
High performance over varying workloads (input/output
operations per second [IOPS] and bandwidth)
Strong reliability
Unfortunately, these goals can compete with one another (for
example, scalability can reduce or inhibit performance or impact
reliability). Ceph has developed some very interesting concepts
(such as dynamic metadata partitioning and data distribution
and replication), which this article explores shortly. Ceph’s design
also incorporates fault-tolerance features to protect against
single points of failure, with the assumption that storage failures
on a large scale (petabytes of storage) will be the norm rather
than the exception. Finally, its design does not assume particular
workloads but includes the ability to adapt to changing distrib-
uted workloads to provide the best performance. It does all of
this with the goal of POSIX compatibility, allowing it to be trans-
parently deployed for existing applications that rely on POSIX
semantics (through Ceph-proposed enhancements). Finally, Ceph
is open source distributed storage and part of the mainline Linux
kernel (2.6.34).

25
User/Admin
Advanced Services (Consume Iaas)
Cloud Management
Infrastucture as a Service
API
(REST)
CLI
(python - client)
UI
(Horizon)
Data
Processing
(Sahara)
Key
Management
(Barbican)
Compute
(Nova)
Compute
(Ironic)
Network
(Neutron)
Image Management
(Glance)
Storage
(Swift)
(Cinder)
(Manila)
Database
Management
(Trove)
Message
Queue
(Zaqar)
DNS
Management
(Designate)
Deployment
(Triple O)
Telemetry
(Ceilometer)
Orchestration
(HEAT)
Common Library
(Oslo)
Identity
(Keystone)
Common/Shared
Message Queue/Database(s)
OpenStack Overview

26
OpenStack for HPC
Cloud Environments
Ceph architecture
Now, let’s explore the Ceph architecture and its core elements at
a high level. I then dig down another level to identify some of the
key aspects of Ceph to provide a more detailed exploration.
The Ceph ecosystem can be broadly divided into four segments
(see Figure 1): clients (users of the data), metadata servers (which
cache and synchronize the distributed metadata), an object
storage cluster (which stores both data and metadata as objects
andimplementsotherkeyresponsibilities),andfinallythecluster
monitors (which implement the monitoring functions).
As Figure 1 shows, clients perform metadata operations (to iden-
tify the location of data) using the metadata servers. The meta-
data servers manage the location of data and also where to store
new data. Note that metadata is stored in the storage cluster (as
indicated by “Metadata I/O”).
Object
Storage
Cluster
Metadata
Server
Cluster
Cluster
Monitors
Metadata
ops
File (object)
I/O
Metadata I/O
Clients
Figure 1: Conceptual architecture of the Ceph ecosystem

27
Actual file I/O occurs between the client and object storage
cluster. In this way, higher-level POSIX functions (such as open,
close, and rename) are managed through the metadata servers,
whereas POSIX functions (such as read and write) are managed
directly through the object storage cluster.
Another perspective of the architecture is provided in Figure 2. A
set of servers access the Ceph ecosystem through a client inter-
face, which understands the relationship between metadata
servers and object-level storage. The distributed storage system
can be viewed in a few layers, including a format for the storage
devices (the Extent and B-tree-based Object File System [EBOFS]
or an alternative) and an overriding management layer designed
to manage data replication, failure detection, and recovery and
subsequent data migration called Reliable Autonomic Distrib-
uted Object Storage (RADOS). Finally, monitors are used to iden-
tify component failures, including subsequent notification.
Server
Object storage daemon
BTRFS/EBOFS/...
Storage
Metadata servers
Ceph client interface
Figure 2: Simplified layered view of the Ceph ecosystem
Ceph components
With the conceptual architecture of Ceph under your belts, you
can dig down another level to see the major components imple-
mented within the Ceph ecosystem. One of the key differences
between Ceph and traditional file systems is that rather than
focusing the intelligence in the file system itself, the intelligence
is distributed around the ecosystem.
Figure 3 shows a simple Ceph ecosystem. The Ceph Client is the
useroftheCephfilesystem.TheCephMetadataDaemonprovides
the metadata services, while the Ceph Object Storage Daemon
provides the actual storage (for both data and metadata). Finally,
the Ceph Monitor provides cluster management. Note that
there can be many Ceph clients, many object storage endpoints,
numerous metadata servers (depending on the capacity of the
file system), and at least a redundant pair of monitors. So, how is
this file system distributed?
Applications/Users
cmon (Ceph monitor)
cosd (Ceph object storage daemon)
BTRFS/EBOFS/...
cmds (Ceph metadata daemon)
DRAM cache
(Enhanced) POSIX
VFS
User-space
Linux kernel
Ceph
Disk DiskDiskDisk
Ceph client
Figure 3: Simple Ceph ecosystem

28
Kernel or user space
Early versions of Ceph utilized Filesystems in User SpacE (FUSE),
which pushes the file system into user space and can greatly
simplify its development. But today, Ceph has been integrated
into the mainline kernel, making it faster, because user space
context switches are no longer necessary for file system I/O.
Ceph client
As Linux presents a common interface to the file systems
(through the virtual file system switch [VFS]), the user’s perspec-
tive of Ceph is transparent. The administrator’s perspective will
certainly differ, given the potential for many servers encom-
passing the storage system (see the Resources section for infor-
mation on creating a Ceph cluster). From the users’ point of view,
they have access to a large storage system and are not aware of
the underlying metadata servers, monitors, and individual object
storage devices that aggregate into a massive storage pool.
Users simply see a mount point, from which standard file I/O can
be performed.
The Ceph file system–or at least the client interface–is imple-
mented in the Linux kernel. Note that in the vast majority of
file systems, all of the control and intelligence is implemented
within the kernel’s file system source itself. But with Ceph, the
file system’s intelligence is distributed across the nodes, which
simplifies the client interface but also provides Ceph with the
ability to massively scale (even dynamically).
Rather than rely on allocation lists (metadata to map blocks on
a disk to a given file), Ceph uses an interesting alternative. A file
from the Linux perspective is assigned an inode number (INO)
from the metadata server, which is a unique identifier for the
file. The file is then carved into some number of objects (based
on the size of the file). Using the INO and the object number
(ONO), each object is assigned an object ID (OID). Using a simple
hash over the OID, each object is assigned to a placement group.
OpenStack for HPC
Cloud Environments

29
The placement group (identified as a PGID) is a conceptual
container for objects. Finally, the mapping of the placement
group to object storage devices is a pseudo-random mapping
using an algorithm called Controlled Replication Under Scalable
Hashing (CRUSH). In this way, mapping of placement groups (and
replicas) to storage devices does not rely on any metadata but
instead on a pseudo-random mapping function. This behavior is
ideal, because it minimizes the overhead of storage and simpli-
fies the distribution and lookup of data.
The final component for allocation is the cluster map. The cluster
map is an efficient representation of the devices representing
the storage cluster. With a PGID and the cluster map, you can
locate any object.
The Ceph metadata server
The job of the metadata server (cmds) is to manage the file
system’s namespace. Although both metadata and data are
stored in the object storage cluster, they are managed separately
to support scalability. In fact, metadata is further split among
a cluster of metadata servers that can adaptively replicate and
distribute the namespace to avoid hot spots. As shown in Figure
4, the metadata servers manage portions of the namespace and
can overlap (for redundancy and also for performance). The
mapping of metadata servers to namespace is performed in
Ceph using dynamic subtree partitioning, which allows Ceph to
adapt to changing workloads (migrating namespaces between
metadata servers) while preserving locality for performance.
But because each metadata server simply manages the
namespace for the population of clients, its primary applica-
tion is an intelligent metadata cache (because actual metadata
is eventually stored within the object storage cluster). Metadata
to write is cached in a short-term journal, which eventually is
pushed to physical storage. This behavior allows the meta-
data server to serve recent metadata back to clients (which is
common in metadata operations). The journal is also useful for
failure recovery: if the metadata server fails, its journal can be
replayed to ensure that metadata is safely stored on disk.
Metadata servers manage the inode space, converting file names
to metadata. The metadata server transforms the file name into
an inode, file size, and striping data (layout) that the Ceph client
uses for file I/O.
Ceph monitors
Ceph includes monitors that implement management of the
cluster map, but some elements of fault management are imple-
mented in the object store itself. When object storage devices fail
or new devices are added, monitors detect and maintain a valid
cluster map. This function is performed in a distributed fashion
where map updates are communicated with existing traffic.
Ceph uses Paxos, which is a family of algorithms for distributed
consensus.
Ceph object storage
Similar to traditional object storage, Ceph storage nodes include
not only storage but also intelligence. Traditional drives are
simple targets that only respond to commands from initiators.
But object storage devices are intelligent devices that act as both
targets and initiators to support communication and collabora-
tion with other object storage devices.
MDS₀ MDS₁ MDS₂ MDS₃ MDS₄
Figure 4: Partitioning of the Ceph namespace for metadata servers

30
From a storage perspective, Ceph object storage devices perform
the mapping of objects to blocks (a task traditionally done at
the file system layer in the client). This behavior allows the local
entity to best decide how to store an object. Early versions of
Ceph implemented a custom low-level file system on the local
storage called EBOFS. This system implemented a nonstandard
interface to the underlying storage tuned for object seman-
tics and other features (such as asynchronous notification of
commits to disk). Today, the B-tree file system (BTRFS) can be
used at the storage nodes, which already implements some of
the necessary features (such as embedded integrity).
Because the Ceph clients implement CRUSH and do not have
knowledge of the block mapping of files on the disks, the under-
lying storage devices can safely manage the mapping of objects
to blocks. This allows the storage nodes to replicate data (when
a device is found to have failed). Distributing the failure recovery
also allows the storage system to scale, because failure detec-
tion and recovery are distributed across the ecosystem. Ceph
calls this RADOS (see Figure 3).
Other features of interest
As if the dynamic and adaptive nature of the file system weren’t
enough, Ceph also implements some interesting features visible
to the user. Users can create snapshots, for example, in Ceph on
any subdirectory (including all of the contents). It’s also possible
to perform file and capacity accounting at the subdirectory level,
which reports the storage size and number of files for a given
subdirectory (and all of its nested contents).
Ceph status and future
Although Ceph is now integrated into the mainline Linux kernel,
it’s properly noted there as experimental. File systems in this state
are useful to evaluate but are not yet ready for production envi-
ronments. But given Ceph’s adoption into the Linux kernel and
the motivation by its originators to continue its development,
it should be available soon to solve your massive storage needs.
OpenStack for HPC
Cloud Environments
Application Containers with Docker

31
Transforming Business Through Software
Gone are the days of private datacenters running off-the-shelf
software and giant monolithic code bases that you updated
once a year. Everything has changed. Whether it is moving to the
cloud, migrating between clouds, modernizing legacy or building
new apps and data structure, the desired results are always the
same – speed. The faster you can move defines your success as a
company.
Software is the critical IP that defines your company even
if the actual product you are selling may be a t-shirt, a car,
or compounding interest. Software is how you engage your
customers, reach new users, understand their data, promote
your product or service and process their order.
To do this well, today’s software is going bespoke. Small pieces
of software that are designed for a very specific job are called
microservices. The design goal of microservices is to have each
service built with all of the necessary components to “run” a
specific job with just the right type of underlying infrastructure
resources. Then, these services are loosely coupled together so
they can be changed at anytime, without having to worry about
the service that comes before or after it.
This methodology, while great for continuous improvement,
poses many challenges in reaching the end state. First it creates
a new, ever-expanding matrix of services, dependencies and
infrastructure making it difficult to manage. Additionally it does
not account for the vast amounts of existing legacy applications
in the landscape, the heterogeneity up and down the application
stack and the processes required to ensure this works in practice.
The Docker Journey and the Power of AND
In 2013, Docker entered the landscape with application
containers to build, ship and run applications anywhere. Docker
was able to take software and its dependencies package them up
into a lightweight container. Similar to how shipping containers
are today, software containers are simply a standard unit of soft-
ware that looks the same on the the outside regardless of what
code and dependencies are included on the inside. This enabled
developers and sysadmins to transport them across infrastruc-
tures and various environments without requiring any modi-
fications and regardless of the varying configurations in the
different environments. The Docker journey begins here.
Agility: The speed and simplicity of Docker was an
instant hit with developers and is what led to the
meteoric rise in the open source project. Developers
are now able to very simply package up any software and its
dependencies into a container. Developers are able to use any
language, version and tooling because they are all packaged in
a container, which “standardizes” all that heterogeneity without
sacrifice.
Portability: Just by the nature of the Docker tech-
nology, these very same developers have realized
their application containers are now portable – in
ways not previously possible. They can ship their applications
from development, to test and production and the code will work
as designed every time. Any differences in the environment did
not affect what was inside the container. Nor did they need to
change their application to work in production. This was also a
boon for IT operations teams as they can now move applications
across datacenters for clouds to avoid vendor lock in.

32
Control: As these applications move along the life-
cycle to production, new questions around security,
manageability and scale need to be answered. Docker
“standardizes” your environment while maintaining the hetero-
geneity your business requires. Docker provides the ability to set
the appropriate level of control and flexibility to maintain your
service levels, performance and regulatory compliance. IT oper-
ations teams can provision, secure, monitor and scale the infra-
structure and applications to maintain peak service levels. No
two applications or businesses are alike and the Docker allows
you to decide how to control your application environment.
At the core of the Docker journey is the power of AND. Docker
is the only solution to provide agility, portability and control for
developers and IT operations team across all stages of the appli-
cation lifecycle. From these core tenets, Containers as a Service
(CaaS) emerges as the construct by which these new applications
are built better and faster.
Docker Containers as a Service (Caas)
What is Containers as a Service (CaaS)? It is an IT managed and
secured application environment of infrastructure and content
where developers can in a self service manner, build and deploy
applications.
BUILD
Development
Environments
Secure Content
Collaboration
Deploy, Manage,
Scale
SHIP RUN
Developers IT Operations
Diagram: CaaS Workflow
OpenStack for HPC
Cloud Environments

33
In the CaaS diagram above, development and IT operations
team collaborate through the registry. This is a service in which
a library of secure and signed images can be maintained. From
the registry, developers on the left are able to pull and build soft-
ware at their pace and then push the content back to the registry
once it passes integration testing to save the latest version.
Depending on the internal processes, the deployment step is
either automated with tools or can be manually deployed.
The IT operations team on the right in above diagram manages
the different vendor contracts for the production infrastruc-
ture such as compute, networking and storage. These teams
are responsible to provision the compute resources needed for
the application and use the Docker Universal Control Plane to
monitor the clusters and applications over time. Then, they can
move the apps from one cloud to another or scale up or down a
service to maintain peak performance.
Key Characteristics and Considerations
The Docker CaaS provides a framework for organizations to unify
the variety of systems, languages and tools in their environment
and apply the level of control, security or freedom required for
their business. As a Docker native solution with full support of
the Docker API, Docker CaaS can seamlessly take the application
from local development to production without changing the
code and streamlining the deployment cycle.
The following characteristics form the minimum requirements
for any organization’s application environment. In this paradigm,
development and IT operations teams are empowered to use the
best tools for their respective jobs without worrying of breaking
systems, each other’s workflows or lock-in.
2
3
The needs of developers and operations.
Many tools specifically address the functional needs of only
one team; however, CaaS breaks the cycle for continuous
improvement. To truly gain acceleration in the development
to production timeline, you need to address both users along
a continuum. Docker provides unique capabilities for each
team as well as a consistent API across the entire platform
for a seamless transition from one team to the next.
All stages in the application lifecycle.
From continuous integration to delivery and devops, these
practices are about eliminating the waterfall development
methodology and the lagging innovation cycles with it. By
providing tools for both the developer and IT operations,
Docker is able to seamlessly support an application from
build, test, stage to production.
Any language.
Developer agility means the freedom to build with whatever
language, version and tooling required for the features they
are building at that time. Also, the ability to run multiple
versions of a language at the same time provides a greater
level of flexibility. Docker allows your team to focus on
building the app instead of thinking of how to build an app
that works in Docker.
Any operating system.
The vast majority of organizations have more than one oper-
ating system. Some tools just work better in Linux while
others in Windows. Application platforms need to account
and support this diversity, otherwise they are solving only
part of the problem. Originally created for the Linux commu-
nity, Docker and Microsoft are bringing forward Windows
Server support to address the millions of enterprise applica-
tions in existence today and future applications.
1
4

34
Any infrastructure.
When it comes to infrastructure, organizations want
choice, backup and leverage. Whether that means you have
multiple private data centers, a hybrid cloud or multiple
cloud providers, the critical component is the ability to
move workloads from one environment to another, without
causing application issues. The Docker technology architec-
ture abstracts the infrastructure away from the application
allowing the application containers to be run anywhere and
portable across any other infrastructure.
Open APIs, pluggable architecture and ecosystem.
A platform isn’t really a platform if it is an island to itself.
Implementing new technologies is often not possible if you
need to re-tool your existing environment first. A funda-
mental guiding principle of Docker is a platform that is
open. Being open means APIs and plugins to make it easy for
you to leverage your existing investments and to fit Docker
into your environment and processes. This openness invites
a rich ecosystem to flourish and provide you with more
flexibility and choice in adding specialized capabilities to
your CaaS.
Although many, these characteristics are critical as the new
bespoke application paradigms only invite in greater hetero-
geneity into your technical architecture. The Docker CaaS plat-
form is fundamentally designed to support that diversity, while
providing the appropriate controls to manage at any scale.
5
6
OpenStack for HPC
Cloud Environments

35
Docker CaaS
The Docker CaaS platform is made possible by a suite of inte-
grated software solutions with a flexible deployment model to
meet the needs of your business.
On-Premises/VPC: For organizations who need to keep their
IP within their network, Docker Trusted Registry and Docker
Universal Control Plane can be deployed on-premises or
in a VPC and connected to your existing infrastructure and
systems like storage, Active Directory/LDAP, monitoring and
logging solutions. Trusted Registry provides the ability to
store and manage images on your storage infrastructure
while also managing role based access control to the images.
Universal Control Plane provides visibility across your Docker
environment including Swarm clusters, Trusted Registry
repositories, containers and multi container applications.
In the Cloud: For organizations who readily use SaaS solu-
tions, Docker Hub and Tutum by Docker provide a registry
service and control plane that is hosted and managed by
Docker. Hub is a cloud registry service to store and manage
your images and users permissions. Tutum by Docker provi-
sions and manages the deployment clusters as well as moni-
tors and manages the deployed applications. Connect to the
cloud infrastructure of your choice or bring your own phys-
ical node to deploy your application.
Your Docker CaaS can be designed to provide a centralized point
of control and management or allow for decentralized manage-
ment to empower individual application teams. The flexibility
allows you to create a model that is right for your business,
just like how you choose your infrastructure and implement
processes. CaaS is an extension of that to build, ship and run
applications.
Many IT initiatives are enabled and in fact, accelerated by CaaS
due to its unifying nature across the environment. Each organiza-
tion has their take on the terms for the initiatives but they range
from things like containerization, which may involve the “lift
and shift” of existing apps, or the adoption of microservices to
continuous integration, delivery and devops and various flavors
of the cloud including adoption, migration, hybrid and multiple.
In each scenario, Docker CaaS brings the agility, portability and
control to enable the adoption of those use cases across the
organization.
BUILD SHIP RUN
Docker Toolbox
Docker Hub
Docker Trusted Registry
Tutm by Docker
Docker Universal Control Plane
Diagram: CaaS Workflow

36
S
EW
SE
SW
tech BOX
Architecture of Docker
Docker is an open-source project that automates the deploy-
ment of applications inside software containers, by providing
an additional layer of abstraction and automation of operat-
ing-system-level virtualization on Linux. Docker uses the re-
source isolation features of the Linux kernel such as cgroups
and kernel namespaces, and a union-capable filesystem such as
AUFS and others to allow independent “containers” to run with-
in a single Linux instance, avoiding the overhead of starting and
maintaining virtual machines.
The Linux kernel’s support for namespaces mostly isolates an
application’s view of the operating environment, including pro-
cess trees, network, user IDs and mounted file systems, while
the kernel’s cgroups provide resource limiting, including the
CPU, memory, block I/O and network. Since version 0.9, Docker
includes the libcontainer library as its own way to directly use
virtualization facilities provided by the Linux kernel, in addi-
tion to using abstracted virtualization interfaces via libvirt, LXC
(Linux Containers) and systemd-nspawn.
As actions are done to a Docker base image, union filesystem lay-
ers are created and documented, such that each layer fully de-
scribes how to recreate an action. This strategy enables Docker’s
lightweight images, as only layer updates need to be propagated
(compared to full VMs, for example).
OpenStack for HPC
Cloud Environments

37
Overview
Docker implements a high-level API to provide lightweight
containers that run processes in isolation.
Building on top of facilities provided by the Linux kernel (primar-
ily cgroups and namespaces), a Docker container, unlike a vir-
tual machine, does not require or include a separate operating
system. Instead, it relies on the kernel’s functionality and uses
resource isolation (CPU, memory, block I/O, network, etc.) and
separate namespaces to isolate the application’s view of the
operating system. Docker accesses the Linux kernel’s virtualiza-
tion features either directly using the libcontainer library, which
is available as of Docker 0.9, or indirectly via libvirt, LXC (Linux
Containers) or systemd-nspawn.
By using containers, resources can be isolated, services restrict-
ed, and processes provisioned to have an almost completely
private view of the operating system with their own process ID
space, file system structure, and network interfaces. Multiple
containers share the same kernel, but each container can be con-
strained to only use a defined amount of resources such as CPU,
memory and I/O.
Using Docker to create and manage containers may simplify
the creation of highly distributed systems by allowing multiple
applications, worker tasks and other processes to run autono-
mously on a single physical machine or across multiple virtual
machines. This allows the deployment of nodes to be performed
as the resources become available or when more nodes are need-
ed, allowing a platform as a service (PaaS)-style of deployment
and scaling for systems like Apache Cassandra, MongoDB or Riak.
Docker also simplifies the creation and operation of task or work-
load queues and other distributed systems.
Integration
Docker can be integrated into various infrastructure tools, in-
cluding Amazon Web Services, Ansible, CFEngine, Chef, Google
Cloud Platform, IBM Bluemix, Jelastic, Jenkins,Microsoft Azure,
OpenStack Nova, OpenSVC, HPE Helion Stackato, Puppet, Salt,
Vagrant, and VMware vSphere Integrated Containers.
The Cloud Foundry Diego project integrates Docker into the
Cloud Foundry PaaS.
The GearD project aims to integrate Docker into the Red Hat’s
OpenShift Origin PaaS.
libcontainer
libvirt
systemd-
nspawnLXC
cgroups namespaces
SELinux
AppArmor
Netﬁlter
Netlink
capabilities
Docker
Linux kernel
Docker can use different interfaces to access virtualization
features of the Linux kernel.

39
Advanced Cluster
Management Made Easy
High Performance Computing (HPC) clusters serve a crucial role at research-intensive
organizations in industries such as aerospace, meteorology, pharmaceuticals, and oil
and gas exploration.
The thing they have in common is a requirement for large-scale computations. Bright’s
solution for HPC puts the power of supercomputing within the reach of just about any
organization.
Engineering | Life Sciences | Automotive | Price Modelling | Aerospace | CAE | Data Analytics

40
The Bright Advantage
Bright Cluster Manager for HPC makes it easy to build and oper-
ate HPC clusters using the server and networking equipment of
your choice, tying them together into a comprehensive, easy to
manage solution.
Bright Cluster Manager for HPC lets customers deploy complete
clusters over bare metal and manage them effectively. It pro-
vides single-pane-of-glass management for the hardware, the
operating system, HPC software, and users. With Bright Cluster
Manager for HPC, system administrators can get clusters up and
running quickly and keep them running reliably throughout their
life cycle – all with the ease and elegance of a fully featured, en-
terprise-grade cluster manager.
Bright Cluster Manager was designed by a group of highly expe-
rienced cluster specialists who perceived the need for a funda-
mental approach to the technical challenges posed by cluster
management. The result is a scalable and flexible architecture,
forming the basis for a cluster management solution that is ex-
tremely easy to install and use, yet is suitable for the largest and
most complex clusters.
Scale clusters to thousands of nodes
Bright Cluster Manager was designed to scale to thousands of
nodes. It is not dependent on third-party (open source) software
that was not designed for this level of scalability. Many advanced
features for handling scalability and complexity are included:
Management Daemon with Low CPU Overhead
The cluster management daemon (CMDaemon) runs on every
node in the cluster, yet has a very low CPU load – it will not cause
any noticeable slow-down of the applications running on your
cluster.
Because the CMDaemon provides all cluster management func-
tionality, the only additional daemon required is the workload
manager (or queuing system) daemon. For example, no daemons
The cluster installer takes you through the installation process and
offers advanced options such as “Express” and “Remote”.
Advanced Cluster
Easy-to-use, complete and scalable

41
for monitoring tools such as Ganglia or Nagios are required. This
minimizes the overall load of cluster management software on
your cluster.
Multiple Load-Balancing Provisioning Nodes
Node provisioning – the distribution of software from the head
node to the regular nodes – can be a significant bottleneck in
large clusters. With Bright Cluster Manager, provisioning capabil-
ity can be off-loaded to regular nodes, ensuring scalability to a
virtually unlimited number of nodes.
When a node requests its software image from the head node,
the head node checks which of the nodes with provisioning ca-
pability has the lowest load and instructs the node to download
or update its image from that provisioning node.
Synchronized Cluster Management Daemons
The CMDaemons are synchronized in time to execute tasks in
exact unison. This minimizes the effect of operating system jit-
ter (OS jitter) on parallel applications. OS jitter is the “noise” or
“jitter” caused by daemon processes and asynchronous events
such as interrupts. OS jitter cannot be totally prevented, but for
parallel applications it is important that any OS jitter occurs at
the same moment in time on every compute node.
Built-In Redundancy
Bright Cluster Manager supports redundant head nodes and re-
dundant provisioning nodes that can take over from each other
in case of failure. Both automatic and manual failover configura-
tions are supported.
Other redundancy features include support for hardware and
software RAID on all types of nodes in the cluster.
Diskless Nodes
Bright Cluster Manager supports diskless nodes, which can be
useful to increase overall Mean Time Between Failure (MTBF),
particularly on large clusters. Nodes with only InfiniBand and no
Ethernet connection are possible too.
Bright Cluster Manager 7 for HPC
For organizations that need to easily deploy and manage HPC
clusters Bright Cluster Manager for HPC lets customers deploy
complete clusters over bare metal and manage them effectively.
It provides single-pane-of-glass management for the hardware,
the operating system, the HPC software, and users.
With Bright Cluster Manager for HPC, system administrators can
quickly get clusters up and running and keep them running relia-
bly throughout their life cycle – all with the ease and elegance of
a fully featured, enterprise-grade cluster manager.
Features
Simple Deployment Process – Just answer a few questions
about your cluster and Bright takes care of the rest. It installs
all of the software you need and creates the necessary config-
uration files for you, so you can relax and get your new cluster
up and running right – first time, every time.

S
EW
SE
SW
tech BOX
42
Head Nodes Regular Nodes
A cluster can have different types of nodes, but it will always
have at least one head node and one or more “regular” nodes.
A regular node is a node which is controlled by the head node
and which receives its software image from the head node or
from a dedicated provisioning node.
Regular nodes come in different flavors.
Some examples include:
Failover Node – A node that can take over all functionalities
of the head node when the head node becomes unavailable.
Compute Node – A node which is primarily used for compu-
tations.
Login Node – A node which provides login access to users of
the cluster. It is often also used for compilation and job sub-
mission.
I/O Node – A node which provides access to disk storage.
Provisioning Node – A node from which other regular nodes
can download their software image.
Workload Management Node – A node that runs the central
workload manager, also known as queuing system.
Subnet Management Node – A node that runs an InfiniBand
subnet manager.
More types of nodes can easily be defined and configured
with Bright Cluster Manager. In simple clusters there is only
one type of regular node – the compute node – and the head
node fulfills all cluster management roles that are described
in the regular node types mentioned above.
Advanced Cluster

43
All types of nodes (head nodes and regular nodes) run the same
CMDaemon. However, depending on the role that has been as-
signed to the node, the CMDaemon fulfills different tasks. This
node has a ‘Login Role’, ‘Storage Role’ and ‘PBS Client Role’.
The architecture of Bright Cluster Manager is implemented by
the following key elements:
The Cluster Management Daemon (CMDaemon)
The Cluster Management Shell (CMSH)
The Cluster Management GUI (CMGUI)
Connection Diagram
A Bright cluster is managed by the CMDaemon on the head node,
which communicates with the CMDaemons on the other nodes
over encrypted connections.
The CMDaemon on the head node exposes an API which can also
be accessed from outside the cluster. This is used by the cluster
management Shell and the cluster management GUI, but it can
also be used by other applications if they are programmed to ac-
cess the API. The API documentation is available on request.
The diagram below shows a cluster with one head node, three
regular nodes and the CMDaemon running on each node.
Elements of the Architecture

44
Can Install on Bare Metal – With Bright Cluster Manager there
is nothing to pre-install. You can start with bare metal serv-
ers, and we will install and configure everything you need. Go
from pallet to production in less time than ever.
Comprehensive Metrics and Monitoring – Bright Cluster Man-
ager really shines when it comes to showing you what’s going
on in your cluster. It’s beautiful graphical user interface lets
you monitor resources and services across the entire cluster
in real time.
Powerful User Interfaces – With Bright Cluster Manager you
don’t just get one great user interface, we give you two. That
way, you can choose to provision, monitor, and manage your
HPC clusters with a traditional command line interface or our
beautiful, intuitive graphical user interface.
OptimizetheUseofITResourcesbyHPCApplications–Bright
ensures HPC applications get the resources they require
according to the policies of your organization. Your cluster
will prioritize workloads according to your business priorities.
HPCToolsandLibrariesIncluded–EverycopyofBrightCluster
Manager comes with a complete set of HPC tools and libraries
so you will be ready to develop, debug, and deploy your HPC
code right away.
“The building blocks for transtec HPC solutions must be chosen
according to our goals ease-of-management and ease-of-use.
With Bright Cluster Manager, we are happy to have the technology
leader at hand, meeting these requirements, and our customers
value that.”
Jörg Walter | HPC Solution Engineer
Monitor your entire cluster at an glance with Bright Cluster Manager
Advanced Cluster

45
Quickly build and deploy an HPC cluster
When you need to get an HPC cluster up and running, don’t waste
time cobbling together a solution from various open source
tools. Bright’s solution comes with everything you need to set up
a complete cluster from bare metal and manage it with a beauti-
ful, powerful user interface.
Whether your cluster is on-premise or in the cloud, Bright is a
virtual one-stop-shop for your HPC project.
Build a cluster in the cloud
Bright’s dynamic cloud provisioning capability lets you build an
entire cluster in the cloud or expand your physical cluster into
the cloud for extra capacity. Bright allocates the compute re-
sources in Amazon Web Services automatically, and on demand.
Build a hybrid HPC/Hadoop cluster
Does your organization need HPC clusters for technical comput-
ing and Hadoop clusters for Big Data? The Bright Cluster Man-
ager for Apache Hadoop add-on enables you to easily build and
manage both types of clusters from a single pane of glass. Your
system administrators can monitor all cluster operations, man-
age users, and repurpose servers with ease.
Driven by research
Bright has been building enterprise-grade cluster management
software for more than a decade. Our solutions are deployed in
thousands of locations around the globe. Why reinvent the
wheel when you can rely on our experts to help you implement
the ideal cluster management solution for your organization.
Clusters in the cloud
Bright Cluster Manager can provision and manage clusters that
are running on virtual servers inside the AWS cloud as if they
were local machines. Use this feature to build an entire cluster
in AWS from scratch or extend a physical cluster into the cloud
when you need extra capacity.
Use native services for storage in AWS – Bright Cluster Manager
lets you take advantage of inexpensive, secure, durable, flexible
and simple storage services for data use, archiving, and backup
in the AWS cloud.
Make smart use of AWS for HPC – Bright Cluster Manager can
save you money by instantiating compute resources in AWS only
when they are needed. It uses built-in intelligence to create in-
stances only after the data is ready for processing and the back-
log in on-site workloads requires it.
You decide which metrics to monitor. Just drag a component into the display
area an Bright Cluster Manager instantly creates a graph of the data you need.

BOX
46
High performance meets efficiency
Initially, massively parallel systems constitute a challenge to
both administrators and users. They are complex beasts. Anyone
building HPC clusters will need to tame the beast, master the
complexity and present users and administrators with an easy-
to-use, easy-to-manage system landscape.
Leading HPC solution providers such as transtec achieve this
goal. They hide the complexity of HPC under the hood and match
high performance with efficiency and ease-of-use for both users
and administrators. The “P” in “HPC” gains a double meaning:
“Performance” plus “Productivity”.
Cluster and workload management software like Moab HPC
Suite, Bright Cluster Manager or Intel Fabric Suite provide the
means to master and hide the inherent complexity of HPC sys-
tems. For administrators and users, HPC clusters are presented
as single, large machines, with many different tuning parame-
ters. The software also provides a unified view of existing clus-
ters whenever unified management is added as a requirement by
the customer at any point in time after the first installation. Thus,
daily routine tasks such as job management, user management,
queue partitioning and management, can be performed easily
with either graphical or web-based tools, without any advanced
scripting skills or technical expertise required from the adminis-
trator or user.
Advanced Cluster

S
EW
SE
SW tech BOX
47
Bright Cluster Manager unleashes and manages
the unlimited power of the cloud
Create a complete cluster in Amazon EC2, or easily extend your
onsite cluster into the cloud. Bright Cluster Manager provides a
“single pane of glass” to both your on-premise and cloud resourc-
es, enabling you to dynamically add capacity and manage cloud
nodes as part of your onsite cluster.
Cloud utilization can be achieved in just a few mouse clicks, with-
out the need for expert knowledge of Linux or cloud computing.
With Bright Cluster Manager, every cluster is cloud-ready, at no
extra cost. The same powerful cluster provisioning, monitoring,
scheduling and management capabilities that Bright Cluster
Manager provides to onsite clusters extend into the cloud.
The Bright advantage for cloud utilization
Ease of use: Intuitive GUI virtually eliminates user learning
curve; no need to understand Linux or EC2 to manage system.
Alternatively, cluster management shell provides powerful
scripting capabilities to automate tasks.
Complete management solution: Installation/initialization,
provisioning, monitoring, scheduling and management in
one integrated environment.
Integrated workload management: Wide selection of work-
load managers included and automatically configured with
local, cloud and mixed queues.
Single pane of glass; complete visibility and control: Cloud
computenodesmanagedaselementsoftheon-sitecluster;vi-
sible from a single console with drill-downs and historic data.
Efficient data management via data-aware scheduling: Auto-
matically ensures data is in position at start of computation;
delivers results back when complete.
Secure, automatic gateway: Mirrored LDAP and DNS services
inside Amazon VPC, connecting local and cloud-based nodes
for secure communication over VPN.
Cost savings: More efficient use of cloud resources; support
for spot instances, minimal user intervention.
Two cloud scenarios
Bright Cluster Manager supports two cloud utilization scenarios:
“Cluster-on-Demand” and “Cluster Extension”.
Scenario 1: Cluster-on-Demand
The Cluster-on-Demand scenario is ideal if you do not have a clus-
ter onsite, or need to set up a totally separate cluster. With just a
few mouse clicks you can instantly create a complete cluster in
the public cloud, for any duration of time.

48
Scenario 2: Cluster Extension
The Cluster Extension scenario is ideal if you have a cluster onsite
but you need more compute power, including GPUs. With Bright
Cluster Manager, you can instantly add EC2-based resources to
your onsite cluster, for any duration of time. Extending into the
cloud is as easy as adding nodes to an onsite cluster – there are
only a few additional, one-time steps after providing the public
cloud account information to Bright Cluster Manager.
The Bright approach to managing and monitoring a cluster in
the cloud provides complete uniformity, as cloud nodes are
managed and monitored the same way as local nodes:
Load-balanced provisioning
Software image management
Integrated workload management
Interfaces – GUI, shell, user portal
Monitoring and health checking
Compilers, debuggers, MPI libraries, mathematical libraries
and environment modules
Advanced Cluster
Cloud Bursting With Bright

Bright Computing
49
Bright Cluster Manager also provides additional features that are
unique to the cloud.
Amazon spot instance support – Bright Cluster Manager enables
users to take advantage of the cost savings offered by Amazon’s
spot instances. Users can specify the use of spot instances, and
Bright will automatically schedule as available, reducing the cost
to compute without the need to monitor spot prices and sched-
ule manually.
Hardware Virtual Machine (HVM) – Bright Cluster Manager auto-
matically initializes all Amazon instance types, including Cluster
Compute and Cluster GPU instances that rely on HVM virtualization.
Amazon VPC – Bright supports Amazon VPC setups which allows
compute nodes in EC2 to be placed in an isolated network, there-
by separating them from the outside world. It is even possible to
route part of a local corporate IP network to a VPC subnet in EC2,
so that local nodes and nodes in EC2 can communicate easily.
Data-aware scheduling – Data-aware scheduling ensures that
input data is automatically transferred to the cloud and made
accessible just prior to the job starting, and that the output data
is automatically transferred back. There is no need to wait for the
data to load prior to submitting jobs (delaying entry into the job
queue), nor any risk of starting the job before the data transfer is
complete (crashing the job). Users submit their jobs, and Bright’s
data-aware scheduling does the rest.
Bright Cluster Manager provides the choice:
“To Cloud, or Not to Cloud”
Not all workloads are suitable for the cloud. The ideal situation
for most organizations is to have the ability to choose between
onsite clusters for jobs that require low latency communication,
complex I/O or sensitive data; and cloud clusters for many other
types of jobs.
Bright Cluster Manager delivers the best of both worlds: a pow-
erful management solution for local and cloud clusters, with
the ability to easily extend local clusters into the cloud without
compromising provisioning, monitoring or managing the cloud
resources.
One or more cloud nodes can be configured under the Cloud Settings tab.

51
Intelligent HPC
Workload Management
While all HPC systems face challenges in workload demand, workflow constraints,
resource complexity, and system scale, enterprise HPC systems face more stringent
challenges and expectations.
Enterprise HPC systems must meet mission-critical and priority HPC workload demands
for commercial businesses, business-oriented research, and academic organizations.
High Throuhgput Computing | CAD | Big Data Analytics | Simulation | Aerospace | Automotive

52
Solutions to Enterprise
High-Performance Computing Challenges
The Enterprise has requirements for dynamic scheduling, provi-
sioning and management of multi-step/multi-application servic-
es across HPC, cloud, and big data environments. Their workloads
directly impact revenue, product delivery, and organizational ob-
jectives.
Enterprise HPC systems must process intensive simulation and
data analysis more rapidly, accurately and cost-effectively to
accelerate insights. By improving workflow and eliminating job
delays and failures, the business can more efficiently leverage
data to make data-driven decisions and achieve a competitive
advantage. The Enterprise is also seeking to improve resource
utilization and manage efficiency across multiple heterogene-
ous systems. User productivity must increase to speed the time
to discovery, making it easier to access and use HPC resources
and expand to other resources as workloads demand.
Moab HPC Suite
Moab HPC Suite accelerates insights by unifying data center re-
sources, optimizing the analysis process, and guaranteeing ser-
vices to the business. Moab 8.1 for HPC systems and HPC cloud
continually meets enterprise priorities through increased pro-
ductivity, automated workload uptime, and consistent SLAs. It
uses the battle-tested and patented Moab intelligence engine
to automatically balance the complex, mission-critical workload
priorities of enterprise HPC systems. Enterprise customers ben-
efit from a single integrated product that brings together key
Enterprise HPC capabilities, implementation services, and 24/7
support.
It is imperative to automate workload workflows to speed time
to discovery. The following new use cases play a key role in im-
proving overall system performance:
Moab is the “brain” of an HPC system, intelligently optimizing workload
throughput while balancing service levels and priorities.
Intelligent HPC
Workload Management
Moab HPC Suite

53
Elastic Computing – unifies data center resources by assisting
the business management resource expansion through burst-
ing to private/public clouds and other data center resources
utilizing OpenStack as a common platform
Performance and Scale – optimizes the analysis process by
dramatically increasing TORQUE and Moab throughput and
scalability across the board
Tighter cooperation between Moab and Torque – harmoniz-
ing these key structures to reduce overhead and improve
communication between the HPC scheduler and the re-
source manager(s)
More Parallelization – increasing the decoupling of Moab
and TORQUE’s network communication so Moab is less
dependent on TORQUE’S responsiveness, resulting in a 2x
speed improvement and shortening the duration of the av-
erage scheduling iteration in half
Accounting Improvements – allowing toggling between
multiple modes of accounting with varying levels of en-
forcement, providing greater flexibility beyond strict allo-
cation options. Additionally, new up-to-date accounting
balances provide real-time insights into usage tracking.
Viewpoint Admin Portal – guarantees services to the busi-
ness by allowing admins to simplify administrative reporting,
workload status tracking, and job resource viewing
Accelerate Insights
Moab HPC Suite accelerates insights by increasing overall
system, user and administrator productivity, achieving more
accurate results that are delivered faster and at a lower cost
from HPC resources. Moab provides the scalability, 90-99 per-
cent utilization, and simple job submission required to maximize
productivity. This ultimately speeds the time to discovery, aiding
the enterprise in achieving a competitive advantage due to its
HPC system. Enterprise use cases and capabilities include the
following:
OpenStack integration to offer virtual and physical resource
provisioning for IaaS and PaaS
Performance boost to achieve 3x improvement in overall opti-
mization performance
Advanced workflow data staging to enable improved cluster
utilization, multiple transfer methods, and new transfer types

54
“With Moab HPC Suite, we can meet very demanding customers’
requirements as regards unified management of heterogeneous
cluster environments, grid management, and provide them with
flexible and powerful configuration and reporting options. Our
customers value that highly.”
Sven Grützmacher | HPC Solution Engineer
Advanced power management with clock frequency control
and additional power state options, reducing energy costs by
15-30 percent
Workload-optimized allocation policies and provisioning to
get more results out of existing heterogeneous resources and
reduce costs, including topology-based allocation
Unify workload management across heterogeneous clusters
by managing them as one cluster, maximizing resource availa-
bility and administration efficiency
Optimized, intelligent scheduling packs workloads and back-
fills around priority jobs and reservations while balancing
SLAs to efficiently use all available resources
Optimized scheduling and management of accelerators, both
Intel Xeon Phi and GPGPUs, for jobs to maximize their utiliza-
tion and effectiveness
Simplified job submission and management with advanced
job arrays and templates
Showback or chargeback for pay-for-use so actual resource
usage is tracked with flexible chargeback rates and reporting
by user, department, cost center, or cluster
Multi-cluster grid capabilities manage and share workload
across multiple remote clusters to meet growing workload
demand or surges
Guarantee Services to the Business
Job and resource failures in enterprise HPC systems lead to de-
layed results, missed organizational opportunities, and missed
objectives. Moab HPC Suite intelligently automates workload
and resource uptime in the HPC system to ensure that workloads
complete successfully and reliably, avoid failures, and guarantee
services are delivered to the business.
Intelligent HPC
Workload Management
Moab HPC Suite

55
The Enterprise benefits from these features:
Intelligent resource placement prevents job failures with
granular resource modeling, meeting workload requirements
and avoiding at-risk resources
Auto-response to failures and events with configurable
actions to pre-failure conditions, amber alerts, or other
metrics and monitors
Workload-aware future maintenance scheduling that helps
maintain a stable HPC system without disrupting workload
productivity
Real-world expertise for fast time-to-value and system uptime
with included implementation, training and 24/7 support
remote services
Auto SLA Enforcement
Moab HPC Suite uses the powerful Moab intelligence engine to
optimally schedule and dynamically adjust workload to consist-
ently meet service level agreements (SLAs), guarantees, or busi-
ness priorities. This automatically ensures that the right work-
loads are completed at the optimal times, taking into account
the complex number of using departments, priorities, and SLAs
to be balanced. Moab provides the following benefits:
Usage accounting and budget enforcement that schedules
resources and reports on usage in line with resource sharing
agreements and precise budgets (includes usage limits, usage
reports, auto budget management, and dynamic fair share
policies)
SLA and priority policies that make sure the highest priority
workloads are processed first (includes Quality of Service and
hierarchical priority weighting)
Continuous plus future scheduling that ensures priorities and
guarantees are proactively met as conditions and workload
changes (i.e. future reservations, pre-emption)
BOX
transtec HPC solutions are designed for maximum flexibility
and ease of management. We not only offer our customers
the most powerful and flexible cluster management solu-
tion out there, but also provide them with customized setup
and sitespecific configuration. Whether a customer needs
a dynamical Linux-Windows dual-boot solution, unified
management of different clusters at different sites, or the
fine-tuning of the Moab scheduler for implementing fine-
grained policy confi guration – transtec not only gives you
the framework at hand, but also helps you adapt the system
according to your special needs. Needless to say, when cus-
tomers are in need of special trainings, transtec will be there
to provide customers, administrators, or users with specially
adjusted Educational Services.
Having a many years’ experience in High Performance
Computing enabled us to develop efficient concepts for in-
stalling and deploying HPC clusters. For this, we leverage
well-proven 3rd party tools, which we assemble to a total
solution and adapt and confi gure according to the custom-
er’s requirements.
We manage everything from the complete installation and
configuration of the operating system, necessary middle-
ware components like job and cluster management systems
up to the customer’s applications.

HPC Compass 2016_17

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to HPC Compass 2016_17

Similar to HPC Compass 2016_17 (20)

More from Marco van der Hart

More from Marco van der Hart (8)

HPC Compass 2016_17