SlideShare a Scribd company logo
HIGH PERFORMANCE
COMPUTING
Technology Compass 2016/17
& BIG DATA
N
E
NE
NWMore Than 35 Years of Experience
in Scientific Computing
1980markedthebeginningofadecadewherenumerousstartups
were created, some of which later transformed into big players in
the IT market. Technical innovations brought dramatic changes
to the nascent computer market. In Tübingen, close to one of
Germany’s prime and oldest universities, transtec was founded.
In the early days, transtec focused on reselling DEC computers
and peripherals, delivering high-performance workstations to
university institutes and research facilities. In 1987, SUN/Sparc
and storage solutions broadened the portfolio, enhanced by IBM/
RS6000 products in 1991. These were the typical workstations
and server systems for high performance computing then, used
by the majority of researchers worldwide. Meanwhile, transtec
is the biggest European-wide and Germany-based provider of
comprehensive HPC solutions. Many of our HPC clusters entered
the TOP 500 list of the world’s fastest computing systems.
Thus, given this background and history, it is fair to say that tran-
stec looks back upon a more than 35 years’ experience in scientif-
ic computing; our track record shows more than 1,000 HPC instal-
lations. With this experience, we know exactly what customers’
demands are and how to meet them. High performance and ease
of management – this is what customers require today. HPC sys-
tems are for sure required to peak-perform, as their name indi-
cates, but that is not enough: they must also be easy to handle.
Unwieldy design and operational complexity must be avoided or
at least hidden from administrators and particularly users of HPC
computer systems.
High Performance Computing has changed in the course of the
last 10 years. Whereas traditional topics like simulation, large-
SMP systems or MPI parallel computing still are important cor-
nerstones of HPC, new areas of applicability like Business Intelli-
gence/Analytics, new technologies like in-memory or distributed
databases or new algorithms like MapReduce have entered the
arena which is now commonly called HPC & Big Data.
Knowledge about Hadoop or R nowadays belongs to the basic
knowledge of any serious HPC player.
Dynamical deployment scenarios are becoming more and more
important for realizing private (or even public) HPC cloud sce-
narios. The OpenStack ecosystem, object storage concepts like
Ceph, application containers with Docker – all these have been
bleeding-edge technologies once, but have become state-of-the-
art meanwhile.
Therefore, this brochure is now called “Technology Compass
HPC & Big Data”. It covers all of the above-mentioned currently
discussed core technologies. Besides those, more familiar tradi-
tional updates like the Intel Omni-Path architecture are covered
as well, of course. No matter which ingredient an HPC solution
requires – transtec masterly combines excellent and well-cho-
sen components that are already there to a fine-tuned, custom-
er-specific, and thoroughly designed HPC solution.
Your decision for a transtec HPC solution means you opt for most
intensive customer care and best service in HPC. Our experts will
be glad to bring in their expertise and support to assist you at
any stage, from HPC design to daily cluster operations, to HPC
Cloud Services.
Last but not least, transtec HPC Cloud Services provide custom-
ers with the possibility to have their jobs run on dynamically pro-
vided nodes in a dedicated datacenter, professionally managed
and individually customizable. Numerous standard applications
like ANSYS, LS-Dyna, OpenFOAM, as well as lots of codes like
Gromacs, NAMD, VMD, and others are pre-installed, integrated
into an enterprise-ready cloud management environment, and
ready to run.
Have fun reading the
transtec HPC Compass 2016/17!
High Performance Computing
for Everyone
Interview with Dr. Oliver Tennert
Director Technology Management
and HPC Solutions
Dr. Tennert, what’s the news in HPC?
Besides becoming more and more commodity, the HPC
market has seen two major changes within the last couple of
years: on the one hand side HPC has reached over to adjacent
areas that would not have been recognized as part of HPC 5
years ago. Meanwhile however every serious HPC company
tries to find their positioning within the area of “big data”,
often vaguely understood, but sometimes put more precisely
as “big data analytics”.
Superficially, it’s all about analyzing large amounts of data as
has been part of HPC like ever before. But the areas of appli-
cation are becoming more: “big data analytics” is a vital ap-
plication of business intelligence/business analytics. Along
with this new methods, new concepts, and new technologies
emerge. The other change has to do with the way HPC environ-
ments are going to be deployed and managed. Dynamic provi-
sioning of resources for private-cloud scenarios are becoming
more and more important, to which OpenStack has contrib-
uted a lot. In the near future, dynamic deployment will be as
commodity as efficient, but static deployment today.
What’s the deal with “dynamic deployment” anyway?
Dynamic deployment is the core provisioning technology in
cloud environments, public or private ones. Virtual machines
with different OSses, web servers, compute nodes, or any kind
of resources are only deployed when they are needed and
requested.
Dynamic deployment makes sure that the available hardware
capacity will be prepared and deployed for exactly the pur-
pose for which it is needed. The advantage is: less idle capacity
and better utilization than could happen with static environ-
ments where e.g. inactive Windows servers block hardware
capacity whereas Linux-based compute nodes would be
needed at a certain point in time.
Everyone is discussing OpenStack at the moment. What do you
think of the hype and the development of this platform? And
what is its importance for transtec?
OpenStack has indeed been discussed at a time when it still
had little business relevance for an HPC solutions provider.
But this is the case with all innovations, be they hardware- or
software-related.
But the fact that OpenStack is a joint development effort of
several big IT players, had added to the demand for private
clouds in the market on the one hand side, and to its relevance
as the software solution for that on the other hand side.
We at transtec have several customers who either ask us to
deploy a completely new OpenStack environment, or would
like us to migrate their existing one to OpenStack.
OpenStack has the reputation of being a very complex beast,
given the many components. Implementing it can take a while.
Is there any way to simplify this?
Indeed life can be rough if you do it wrong. But there is no need
to: no one goes ahead and installs Linux from scratch nowa-
days, apart from playing around and learning: instead every-
one uses professional distributions with commercial support.
There are special OpenStack distributions too, with commer-
cial support: especially within HPC, Bright Cluster Manager
is often the distribution of choice, but also long-term players
like Red Hat offer simple deployment frameworks.
Our customers can rest assured that they get an individually
optimized OpenStack environment, which is not only de-
signed by us, but also installed and maintained throughout.
It has always been our job to hide any complexity underneath
a layer of “ease of use, ease of management”.
6
Who do you want to address with this topic?
Everyone who is interested in a highly flexible, but also effec-
tive utilization of their existing hardware capacity. We address
customers from academia, as well as SMB customers who
have to turn towards new concepts and new technologies,
like big enterprises.
As said before: in our opinion, this way of resource provision-
ing will become mainstream soon.
Let’s turn to the other big topic: What is the connection between
HPC and Big Data Analytics, and what does the market have to
say about this?
As already mentioned, the HPC market is reaching out towards
big data analytics. “HPC & Big Data” is named in one breath at
various key events like ISC or Supercomputing.
Thisisnotwithoutreasonofcourse:asdifferentasthetechnol-
ogies in use or the goals to reach may be, there is one common
denominator: and this is a high demand for computational
capacity. An HPC solution provider like transtec, who is ca-
pable of designing and deploying a one-petaflop HPC cluster,
naturally also manages the provisioning of say a ten-petabyte
Hadoop cluster for big data analytics to a customer.
But the target group may be a different one: classical HPC
mostly takes place within the value chain of the customer. The
users are developers. Important areas for big data analytics,
however, can be found in business controlling, i.e. BI/BA, but
also in companies’ operations departments, when operation-
al decisions have to be made depending on certain vital state
data, in real time. This is what “internet of things” is all about.
High Performance Computing
for Everyone
Interview with Dr. Oliver Tennert
Director Technology Management
and HPC Solutions
7
Where are the technical differences between HPC and Big Data
Analytics, and what do both have in common?
Both areas share the “scale-out” approach. It is not the single
server that gets more and more powerful, but rather accelera-
tion is reached via parallelization and massive load balancing
on several servers.
In HPC, parallel computational jobs that run using the MPI
programming interface are state of the art. Simulations run on
many compute nodes at once, thus cutting down wall-clock
time. Within the context of big data analytics, distributed and
above all in-memory databases are becoming more and more
important. Database requests can thus be handled by several
servers, and with a low latency.
How do companies handle the backup and the archiving of
these huge amounts of data?
Given this enormous amount of data, standard backup strat-
egies or archiving concepts do indeed not apply anymore. In-
stead, for big data analytics, data are stored in a redundant
way in principle, to become independent from hardware fail-
ures. Storage concepts like Hadoop or object storage like Ceph
offer redundancy in data storage as well.
Moreover, when it comes to big data analytics, the basic prem-
ise is that it really does not matter much when a tiny fraction
out of these huge amounts of data go missing for some rea-
son. The goal of big data analytics is not to get a precise ana-
lytical result out of these often unstructured data, but rather
the quickest possible extraction of relevant information and
a sensible interpretation, whereby statistical methods are of
great importance.
For which companies is it important to do this very quick extrac-
tion of information?
Holiday booking portals that use booking systems have to get
an immediate overview of the hotel rooms that are available
in some random town, sorted by price, in order to provide cus-
tomers with this information. Energy or finance companies
who do exchange tradings of their assets, have to know their
current portfolio’s value in the fraction of a second to do the
right pricing.
Search engine providers – there are more than just Google –
investigative authorities, credit card clearance houses who
have to react quickly to attempted frauds: the list of potential
companies who could make use of big data analytics grows
alongside with the technical progress.
8
High Performance Computing.............................................. 10
Performance Turns Into Productivity..........................................................12
Flexible Deployment With xCAT......................................................................14
Service and Customer Care From A to Z ....................................................16
OpenStack for HPC Cloud Environments....................... 20
What is OpenStack?................................................................................................22
Object Storage with Ceph...................................................................................24
Application Containers with Docker............................................................30
Advanced Cluster Management Made Easy................. 38
Easy-to-use, Complete and Scalable.............................................................40
Cloud Bursting With Bright................................................................................48
Intelligent HPC Workload Management......................... 50
Moab HPC Suite.........................................................................................................52
IBM Platform LSF......................................................................................................56
Slurm................................................................................................................................58
IBM Platform LSF 8...................................................................................................61
Intel Cluster Ready....................................................................... 64
A Quality Standard for HPC Clusters............................................................66
Intel Cluster Ready Builds HPC Momentum............................................70
The transtec Benchmarking Center..............................................................74
Remote Visualization and
Workflow Optimization............................................................. 76
NICE EnginFrame: A Technical Computing Portal................................78
Remote Visualization.............................................................................................82
Desktop Cloud Visualization.............................................................................84
Cloud Computing With transtec and NICE...............................................86
NVIDIA Grid Boards..................................................................................................88
Virtual GPU Technology........................................................................................90
Technology Compass
Table of Contents and Introduction
9
Big Data Analytics with Hadoop.......................................... 92
Apache Hadoop.........................................................................................................94
HDFS Architecture...................................................................................................98
NVIDIA GPU Computing –
The CUDA Architecture............................................................108
What is GPU Computing?................................................................................. 110
Kepler GK110/210 GPU Computing Architecture............................... 112
A Quick Refresher on CUDA............................................................................. 122
Intel Xeon Phi Coprocessor...................................................124
The Architecture.................................................................................................... 126
An Outlook on Knights Landing and Omni Path................................ 128
Vector Supercomputing..........................................................138
The NEC SX Architecture................................................................................... 140
History of the SX Series..................................................................................... 146
Parallel File Systems –
The New Standard for HPC Storage.................................154
Parallel NFS............................................................................................................... 156
What’s New in NFS 4.1?...................................................................................... 158
Panasas HPC Storage.......................................................................................... 160
BeeGFS......................................................................................................................... 172
General Parallel File System (GPFS)............................................................ 176
What’s new in GPFS Version 3.5.................................................................... 190
Lustre............................................................................................................................ 192
Intel Omni-Path Interconnect..............................................196
The Architecture.................................................................................................... 198
Host Fabric Interface (HFI)............................................................................... 200
Architecture Switch Fabric Optimizations............................................ 202
Fabric Software Components........................................................................ 204
Numascale.......................................................................................206
NumaConnect Background............................................................................. 208
Technology................................................................................................................ 210
Redefining Scalable OpenMP and MPI..................................................... 216
Big Data....................................................................................................................... 220
Cache Coherence................................................................................................... 222
NUMA and ccNUMA.............................................................................................. 224
Allinea Forge for parallel
Software Development...........................................................226
Application Performance and Development Tools.......................... 228
What’s in Allinea Forge?.................................................................................... 230
Glossary.............................................................................................242
11
High Performance
Computing – Performance
Turns Into Productivity
High Performance Computing (HPC) has been with us from the very beginning of the
computer era. High-performance computers were built to solve numerous problems
which the “human computers” could not handle.
The term HPC just hadn’t been coined yet. More important, some of the early principles
have changed fundamentally.
Engineering | Life Sciences | Automotive | Price Modelling | Aerospace | CAE | Data Analytics
12
Performance Turns Into Productivity
HPC systems in the early days were much different from those
we see today. First, we saw enormous mainframes from large
computer manufacturers, including a proprietary operating
system and job management system. Second, at universities
and research institutes, workstations made inroads and scien-
tists carried out calculations on their dedicated Unix or VMS
workstations. In either case, if you needed more computing
power, you scaled up, i.e. you bought a bigger machine. Today
the term High-Performance Computing has gained a fundamen-
tally new meaning. HPC is now perceived as a way to tackle
complex mathematical, scientific or engineering problems. The
integration of industry standard, “off-the-shelf” server hardware
into HPC clusters facilitates the construction of computer net-
works of such power that one single system could never achieve.
The new paradigm for parallelization is scaling out.
Computer-supported simulations of realistic processes (so-called
Computer Aided Engineering – CAE) has established itself as a
thirdkeypillarinthefieldofscienceandresearchalongsidetheo-
ry and experimentation. It is nowadays inconceivable that an air-
craft manufacturer or a Formula One racing team would operate
without using simulation software. And scientific calculations,
such as in the fields of astrophysics, medicine, pharmaceuticals
and bio-informatics, will to a large extent be dependent on super-
computers in the future. Software manufacturers long ago rec-
ognized the benefit of high-performance computers based on
powerful standard servers and ported their programs to them
accordingly.
The main advantages of scale-out supercomputers is just that:
they are infinitely scalable, at least in principle. Since they are
based on standard hardware components, such a supercomput-
er can be charged with more power whenever the computational
capacity of the system is not sufficient any more, simply by add-
ing additional nodes of the same kind. A cumbersome switch to a
different technology can be avoided in most cases.
“transtec HPC solutions are meant to provide customers with
unparalleled ease-of-management and ease-of-use. Apart from
that, deciding for a transtec HPC solution means deciding for the
most intensive customer care and the best service imaginable.”
Dr. Oliver Tennert | Director Technology Management & HPC Solutions
High Performance Computing
Performance Turns Into Productivity
S
EW
SE
SW tech BOX
13
Variations on the theme: MPP and SMP
Parallel computations exist in two major variants today. Ap-
plications running in parallel on multiple compute nodes are
frequently so-called Massively Parallel Processing (MPP) appli-
cations. MPP indicates that the individual processes can each
utilize exclusive memory areas. This means that such jobs are
predestined to be computed in parallel, distributed across the
nodes in a cluster. The individual processes can thus utilize the
separate units of the respective node – especially the RAM, the
CPU power and the disk I/O.
Communication between the individual processes is implement-
ed in a standardized way through the MPI software interface
(Message Passing Interface), which abstracts the underlying
network connections between the nodes from the processes.
However, the MPI standard (current version 2.0) merely requires
source code compatibility, not binary compatibility, so an off-the-
shelf application usually needs specific versions of MPI libraries
in order to run. Examples of MPI implementations are OpenMPI,
MPICH2, MVAPICH2, Intel MPI or – for Windows clusters – MS-MPI.
If the individual processes engage in a large amount of commu-
nication, the response time of the network (latency) becomes im-
portant. Latency in a Gigabit Ethernet or a 10GE network is typi-
cally around 10 µs. High-speed interconnects such as InfiniBand,
reduce latency by a factor of 10 down to as low as 1 µs. Therefore,
high-speed interconnects can greatly speed up total processing.
The other frequently used variant is called SMP applications.
SMP, in this HPC context, stands for Shared Memory Processing.
It involves the use of shared memory areas, the specific imple-
mentation of which is dependent on the choice of the underlying
operating system. Consequently, SMP jobs generally only run on
a single node, where they can in turn be multi-threaded and thus
be parallelized across the number of CPUs per node. For many
HPC applications, both the MPP and SMP variant can be chosen.
Many applications are not inherently suitable for parallel execu-
tion. In such a case, there is no communication between the in-
dividual compute nodes, and therefore no need for a high-speed
network between them; nevertheless, multiple computing jobs
can be run simultaneously and sequentially on each individual
node, depending on the number of CPUs.
In order to ensure optimum computing performance for these
applications, it must be examined how many CPUs and cores de-
liver the optimum performance.
We find applications of this sequential type of work typically in
the fields of data analysis or Monte-Carlo simulations.
BOX
14
xCAT as a Powerful and Flexible Deployment Tool
xCAT (Extreme Cluster Administration Tool) is an open source
toolkit for the deployment and low-level administration of HPC
cluster environments, small as well as large ones.
xCAT provides simple commands for hardware control, node
discovery, the collection of MAC addresses, and the node deploy-
ment with (diskful) or without local (diskless) installation. The
cluster configuration is stored in a relational database. Node
groups for different operating system images can be defined.
Also, user-specific scripts can be executed automatically at in-
stallation time.
xCAT Provides the Following Low-Level Administrative
Features
 Remote console support
 Parallel remote shell and remote copy commands
 Plugins for various monitoring tools like Ganglia or Nagios
 Hardware control commands for node discovery, collecting
MAC addresses, remote power switching and resetting of
nodes
 Automatic configuration of syslog, remote shell, DNS, DHCP,
and ntp within the cluster
 Extensive documentation and man pages
For cluster monitoring, we install and configure the open source
tool Ganglia or the even more powerful open source solution Na-
gios, according to the customer’s preferences and requirements.
High Performance Computing
Flexible Deployment With xCAT
15
Local Installation or Diskless Installation
We offer a diskful or a diskless installation of the cluster nodes.
A diskless installation means the operating system is hosted par-
tially within the main memory, larger parts may or may not be
included via NFS or other means. This approach allows for de-
ploying large amounts of nodes very efficiently, and the cluster
is up and running within a very small timescale. Also, updating
the cluster can be done in a very efficient way. For this, only the
boot image has to be updated, and the nodes have to be reboot-
ed. After this, the nodes run either a new kernel or even a new
operating system. Moreover, with this approach, partitioning the
cluster can also be very efficiently done, either for testing pur-
poses, or for allocating different cluster partitions for different
users or applications.
Development Tools, Middleware, and Applications
According to the application, optimization strategy, or underly-
ing architecture, different compilers lead to code results of very
different performance. Moreover, different, mainly commercial,
applications, require different MPI implementations. And even
when the code is self-developed, developers often prefer one MPI
implementation over another.
According to the customer’s wishes, we install various compilers,
MPI middleware, as well as job management systems like Para-
station, Grid Engine, Torque/Maui, or the very powerful Moab
HPC Suite for the high-level cluster management.
BOX
16
transtec HPC as a Service
You will get a range of applications like LS-Dyna, ANSYS, Gromacs,
NAMD etc. from all kinds of areas pre-installed, integrated into an
enterprise-ready cloud and workload management system, and
ready-to run. Do you miss your application?
 Ask us: HPC@transtec.de
transtec Platform as a Service
You will be provided with dynamically provided compute nodes
for running your individual code. The operating system will be
pre-installed according to your requirements. Common Linux dis-
tributions like RedHat, CentOS, or SLES are the standard. Do you
need another distribution?
 Ask us: HPC@transtec.de
transtec Hosting as a Service
You will be provided with hosting space inside a professionally
managed and secured datacenter where you can have your ma-
chines hosted, managed, maintained, according to your require-
ments. Thus, you can build up your own private cloud. What
range of hosting and maintenance services do you need?
 Tell us: HPC@transtec.de
HPC @ transtec:
Services and Customer Care from A to Z
transtec AG has over 30 years of experience in scientific computing
andisoneoftheearliestmanufacturersofHPCclusters.Fornearlya
decade,transtechasdeliveredhighlycustomizedHighPerformance
clusters based on standard components to academic and industry
customersacrossEuropewithallthehighqualitystandardsandthe
customer-centric approach that transtec is well known for. Every
transtec HPC solution is more than just a rack full of hardware – it is
a comprehensive solution with everything the HPC user, owner, and
operator need. In the early stages of any customer’s HPC project,
transtec experts provide extensive and detailed consulting to the
customer – they benefit from expertise and experience. Consulting
is followed by benchmarking of different systems with either spe-
cificallycraftedcustomercodeorgenerallyacceptedbenchmarking
routines; this aids customers in sizing and devising the optimal and
detailed HPC configuration. Each and every piece of HPC hardware
thatleavesourfactoryundergoesaburn-inprocedureof24hoursor
High Performance Computing
customer
training
individual Presales
consulting
application-,
customer-,
site-specific
sizing of
HPC solution
burn-in tests
of systems
benchmarking of
different systems
continual
improvement
software
 OS
installation
application
installation
onsite
hardware
assembly
integration
into
customer’s
environment
maintenance,
support 
managed services
Services and Customer Care from A to Z
Services and Customer
Care from A to Z
17
more if necessary. We make sure that any hardware shipped meets
ourandourcustomers’qualityrequirements.transtecHPCsolutions
are turnkey solutions. By default, a transtec HPC cluster has every-
thing installed and configured – from hardware and operating sys-
temtoimportantmiddlewarecomponentslikeclustermanagement
or developer tools and the customer’s production applications.
Onsite delivery means onsite integration into the customer’s pro-
ductionenvironment,beitestablishingnetworkconnectivitytothe
corporate network, or setting up software and configuration parts.
transtec HPC clusters are ready-to-run systems – we deliver, you
turn the key, the system delivers high performance. Every HPC proj-
ect entails transfer to production: IT operation processes and poli-
ciesapplytothenewHPCsystem.Effectively,ITpersonnelistrained
hands-on, introduced to hardware components and software, with
all operational aspects of configuration management.
transtec services do not stop when the implementation projects
ends. Beyond transfer to production, transtec takes care. transtec
offers a variety of support and service options, tailored to the cus-
tomer’s needs. When you are in need of a new installation, a major
reconfiguration or an update of your solution – transtec is able to
support your staff and, if you lack the resources for maintaining the
cluster yourself, maintain the HPC solution for you.
From Professional Services to Managed Services for daily opera-
tionsandrequiredservicelevels,transtecwillbeyourcompleteHPC
service and solution provider. transtec’s high standards of perfor-
mance, reliability and dependability assure your productivity and
completesatisfaction.transtec’sofferingsofHPCManagedServices
offercustomersthepossibilityofhavingthecompletemanagement
and administration of the HPC cluster managed by transtec service
specialists,inanITILcompliantway.Moreover,transtec’sHPConDe-
mand services help provide access to HPC resources whenever they
need them, for example, because they do not have the possibility
of owning and running an HPC cluster themselves, due to lacking
infrastructure, know-how, or admin staff.
transtec HPC Cloud Services
Last but not least transtec’s services portfolio evolves as custom-
ers‘ demands change. Starting this year, transtec is able to provide
HPCCloudServices.transtecusesadedicateddatacentertoprovide
computing power to customers who are in need of more capacity
than they own, which is why this workflow model is sometimes
called computing-on-demand. With these dynamically provided
resources, customers with the possibility to have their jobs run on
HPC nodes in a dedicated datacenter, professionally managed and
secured, and individually customizable. Numerous standard ap-
plications like ANSYS, LS-Dyna, OpenFOAM, as well as lots of codes
like Gromacs, NAMD, VMD, and others are pre-installed, integrated
into an enterprise-ready cloud and workload management environ-
ment, and ready to run.
Alternatively, whenever customers are in need of space for host-
ing their own HPC equipment because they do not have the space
capacity or cooling and power infrastructure themselves, transtec
is also able to provide Hosting Services to those customers who’d
liketohavetheirequipmentprofessionallyhosted,maintained,and
managed. Customers can thus build up their own private cloud!
Are you interested in any of transtec’s broad range of HPC related
services? Write us an email to HPC@transtec.de. We’ll be happy to
hear from you!
18
High Performance Computing
Services and Customer
Care from A to Z
Are you an end user?
Then doing your work as productive as possible is of utmost im-
portance to you! You are interested in an IT environment running
smoothly, with a fully functional infrastructure, in competent and
professional support by internal administrators and external ser-
vice providers, and a service desk with guaranteed availability at
working times. HPC cloud services ensure you are able to run your
simulation jobs even if computational requirements exceed the lo-
cal capacities.
 support services
 professional services
 managed services
 cloud services
Are you an administrator?
Then you are responsible for providing a well-performing IT infra-
structure, for supporting the end users when problems occur. In
situations where problem management fully exploits your capacity,
or in cases where you just don’t have time or maybe lack experience
in performing certain tasks, you might be glad to be assisted by spe-
cialized IT experts.
 support services
 professional services
Are you an R  D manager?
Then your job is to ensure that developers, researchers, and other
endusersareabletodotheirjob.Tohaveapartnerathandthatpro-
vides you with a full range of operations support, either in a com-
pletely flexible, on-demand base, or in a fully-managed way based
on ITIL best practices, is heartily welcome for you.
 professional services
 managed services
19
Services Continuum
Support
Services
Paid to
Support
Installation
Integration
Migration
Upgrades
Configuration
Changes
Professional
Services
Paid to
Do
Service Desk
Event Monitoring
and Notification
Release
Management
Capacity
Management
Managed
Services
Paid to
Operate
Hosting
Infrastructure
(IaaS)
HPC
(HPCaaS)
Platform
(PaaS)
Cloud
Services
Paid to
Use
Performance
Analysis
Technology
Assessment
Infrastructure
Design
PLAN BUILD RUN
Consulting
Services
Paid to
Advise
Are you a CIO?
Then your main concern is a smoothly-running IT operations envi-
ronment with a maximum of efficiency and effectivity. Managed
Services to ensure business continuity are a main part of your con-
siderations regarding business objectives. HPC cloud services are
a vital part of your capacity plannings to ensure the availability of
computational resources to your internal clients, the R  D depart-
ments.
 managed services
 cloud services
Are you a CEO?
Profitability,time-to-result,businesscontinuity,andensuringfuture
viability are topics that you continuously have to think about. You
are happy to have a partner at hand, who gives you insight into fu-
turedevelopments,providesyouwithtechnologyconsulting,andis
able to help you ensure business continuity and fall-back scenarios,
as well as scalability readiness, by means of IT capacity outsourcing
solutioons.
 cloud services
 consulting services
Life Sciences | CAE | High Performance Computing | Big Data Analytics | Simulation | CAD
OpenStack for HPC
Cloud Environments
OpenStack is a set of software tools for building and managing cloud computing
platforms for public and private clouds. Backed by some of the biggest companies in
software development and hosting, as well as thousands of individual community
members, many think that OpenStack is the future of cloud computing.
OpenStack is managed by the OpenStack Foundation, a non-profit organization which
oversees both development and community-building around the project.
22
What is OpenStack?
Introduction to OpenStack
OpenStackletsusersdeployvirtualmachinesandotherinstances
which handle different tasks for managing a cloud environment
on the fly. It makes horizontal scaling easy, which means that
tasks which benefit from running concurrently can easily serve
more or less users on the fly by just spinning up more instances.
For example, a mobile application which needs to communicate
with a remote server might be able to divide the work of commu-
nicating with each user across many different instances, all
communicating with one another but scaling quickly and easily
as the application gains more users.
And most importantly, OpenStack is open source software, which
means that anyone who chooses to can access the source code,
make any changes or modifications they need, and freely share
these changes back out to the community at large. It also means
that OpenStack has the benefit of thousands of developers all
over the world working in tandem to develop the strongest, most
robust, and most secure product that they can.
How is OpenStack used in a cloud environment?
The cloud is all about providing computing for end users in a
remote environment, where the actual software runs as a service
on reliable and scalable servers rather than on each end users
computer. Cloud computing can refer to a lot of different things,
but typically the industry talks about running different items “as
a service” – software, platforms, and infrastructure. OpenStack
falls into the latter category and is considered Infrastructure as
a Service (IaaS). Providing infrastructure means that OpenStack
makes it easy for users to quickly add new instance, upon which
other cloud components can run. Typically, the infrastructure
then runs a “platform” upon which a developer can create soft-
ware applications which are delivered to the end users.
OpenStack for HPC
Cloud Environments
What is OpenStack?
23
What are the components of OpenStack?
OpenStack is made up of many different moving parts. Because
of its open nature, anyone can add additional components to
OpenStack to help it to meet their needs. But the OpenStack
community has collaboratively identified nine key components
that are a part of the “core” of OpenStack, which are distributed
as a part of any OpenStack system and officially maintained by
the OpenStack community.
 Nova is the primary computing engine behind OpenStack. It
is used for deploying and managing large numbers of virtual
machines and other instances to handle computing tasks.
 Swift is a storage system for objects and files. Rather than
the traditional idea of a referring to files by their location on
a disk drive, developers can instead refer to a unique identi-
fier referring to the file or piece of information and let Open-
Stack decide where to store this information. This makes
scaling easy, as developers don’t have the worry about the
capacity on a single system behind the software. It also
allows the system, rather than the developer, to worry about
how best to make sure that data is backed up in case of the
failure of a machine or network connection.
 Cinder is a block storage component, which is more analo-
gous to the traditional notion of a computer being able to
access specific locations on a disk drive. This more tradi-
tional way of accessing files might be important in scenarios
in which data access speed is the most important consider-
ation.
 Neutron provides the networking capability for OpenStack.
It helps to ensure that each of the components of an Open-
Stack deployment can communicate with one another
quickly and efficiently.
 Horizon is the dashboard behind OpenStack. It is the only
graphical interface to OpenStack, so for users wanting to
give OpenStack a try, this may be the first component they
actually “see.” Developers can access all of the components
of OpenStack individually through an application program-
ming interface (API), but the dashboard provides system
administrators a look at what is going on in the cloud, and to
manage it as needed.
 Keystone provides identity services for OpenStack. It is
essentially a central list of all of the users of the OpenStack
cloud, mapped against all of the services provided by the
cloudwhichtheyhavepermissiontouse.Itprovidesmultiple
means of access, meaning developers can easily map their
existing user access methods against Keystone.
 Glance provides image services to OpenStack. In this case,
“images” refers to images (or virtual copies) of hard disks.
Glance allows these images to be used as templates when
deploying new virtual machine instances.
 Ceilometer provides telemetry services, which allow the
cloud to provide billing services to individual users of the
cloud. It also keeps a verifiable count of each user’s system
usage of each of the various components of an OpenStack
cloud. Think metering and usage reporting.
 Heat is the orchestration component of OpenStack, which
allows developers to store the requirements of a cloud appli-
cation in a file that defines what resources are necessary for
that application. In this way, it helps to manage the infra-
structure needed for a cloud service to run.
24
OpenStack for HPC
Cloud Environments
Object Storage with Ceph
Object Storage with Ceph
Why “Ceph”?
“Ceph” is an odd name for a file system and breaks the typical
acronym trend that most follow. The name is a reference to the
mascot at UCSC (Ceph’s origin), which happens to be “Sammy,”
the banana slug, a shell-less mollusk in the cephalopods class.
Cephalopods, with their multiple tentacles, provide a great meta-
phor for a distributed file system.
Ceph goals
Developing a distributed file system is a complex endeavor, but
it’s immensely valuable if the right problems are solved. Ceph’s
goals can be simply defined as:
 Easy scalability to multi-petabyte capacity
 High performance over varying workloads (input/output
operations per second [IOPS] and bandwidth)
 Strong reliability
Unfortunately, these goals can compete with one another (for
example, scalability can reduce or inhibit performance or impact
reliability). Ceph has developed some very interesting concepts
(such as dynamic metadata partitioning and data distribution
and replication), which this article explores shortly. Ceph’s design
also incorporates fault-tolerance features to protect against
single points of failure, with the assumption that storage failures
on a large scale (petabytes of storage) will be the norm rather
than the exception. Finally, its design does not assume particular
workloads but includes the ability to adapt to changing distrib-
uted workloads to provide the best performance. It does all of
this with the goal of POSIX compatibility, allowing it to be trans-
parently deployed for existing applications that rely on POSIX
semantics (through Ceph-proposed enhancements). Finally, Ceph
is open source distributed storage and part of the mainline Linux
kernel (2.6.34).
25
User/Admin
Advanced Services (Consume Iaas)
Cloud Management
Infrastucture as a Service
API
(REST)
CLI
(python - client)
UI
(Horizon)
Data
Processing
(Sahara)
Key
Management
(Barbican)
Compute
(Nova)
Compute
(Ironic)
Network
(Neutron)
Image Management
(Glance)
Storage
(Swift)
(Cinder)
(Manila)
Database
Management
(Trove)
Message
Queue
(Zaqar)
DNS
Management
(Designate)
Deployment
(Triple O)
Telemetry
(Ceilometer)
Orchestration
(HEAT)
Common Library
(Oslo)
Identity
(Keystone)
Common/Shared
Message Queue/Database(s)
OpenStack Overview
26
OpenStack for HPC
Cloud Environments
Object Storage with Ceph
Ceph architecture
Now, let’s explore the Ceph architecture and its core elements at
a high level. I then dig down another level to identify some of the
key aspects of Ceph to provide a more detailed exploration.
The Ceph ecosystem can be broadly divided into four segments
(see Figure 1): clients (users of the data), metadata servers (which
cache and synchronize the distributed metadata), an object
storage cluster (which stores both data and metadata as objects
andimplementsotherkeyresponsibilities),andfinallythecluster
monitors (which implement the monitoring functions).
As Figure 1 shows, clients perform metadata operations (to iden-
tify the location of data) using the metadata servers. The meta-
data servers manage the location of data and also where to store
new data. Note that metadata is stored in the storage cluster (as
indicated by “Metadata I/O”).
Object
Storage
Cluster
Metadata
Server
Cluster
Cluster
Monitors
Metadata
ops
File (object)
I/O
Metadata I/O
Clients
Figure 1: Conceptual architecture of the Ceph ecosystem
27
Actual file I/O occurs between the client and object storage
cluster. In this way, higher-level POSIX functions (such as open,
close, and rename) are managed through the metadata servers,
whereas POSIX functions (such as read and write) are managed
directly through the object storage cluster.
Another perspective of the architecture is provided in Figure 2. A
set of servers access the Ceph ecosystem through a client inter-
face, which understands the relationship between metadata
servers and object-level storage. The distributed storage system
can be viewed in a few layers, including a format for the storage
devices (the Extent and B-tree-based Object File System [EBOFS]
or an alternative) and an overriding management layer designed
to manage data replication, failure detection, and recovery and
subsequent data migration called Reliable Autonomic Distrib-
uted Object Storage (RADOS). Finally, monitors are used to iden-
tify component failures, including subsequent notification.
Server
Object storage daemon
BTRFS/EBOFS/...
Storage
Metadata servers
Ceph client interface
Figure 2: Simplified layered view of the Ceph ecosystem
Ceph components
With the conceptual architecture of Ceph under your belts, you
can dig down another level to see the major components imple-
mented within the Ceph ecosystem. One of the key differences
between Ceph and traditional file systems is that rather than
focusing the intelligence in the file system itself, the intelligence
is distributed around the ecosystem.
Figure 3 shows a simple Ceph ecosystem. The Ceph Client is the
useroftheCephfilesystem.TheCephMetadataDaemonprovides
the metadata services, while the Ceph Object Storage Daemon
provides the actual storage (for both data and metadata). Finally,
the Ceph Monitor provides cluster management. Note that
there can be many Ceph clients, many object storage endpoints,
numerous metadata servers (depending on the capacity of the
file system), and at least a redundant pair of monitors. So, how is
this file system distributed?
Applications/Users
cmon (Ceph monitor)
cosd (Ceph object storage daemon)
BTRFS/EBOFS/...
cmds (Ceph metadata daemon)
DRAM cache
(Enhanced) POSIX
VFS
User-space
Linux kernel
Ceph
Disk DiskDiskDisk
Ceph client
Figure 3: Simple Ceph ecosystem
28
Kernel or user space
Early versions of Ceph utilized Filesystems in User SpacE (FUSE),
which pushes the file system into user space and can greatly
simplify its development. But today, Ceph has been integrated
into the mainline kernel, making it faster, because user space
context switches are no longer necessary for file system I/O.
Ceph client
As Linux presents a common interface to the file systems
(through the virtual file system switch [VFS]), the user’s perspec-
tive of Ceph is transparent. The administrator’s perspective will
certainly differ, given the potential for many servers encom-
passing the storage system (see the Resources section for infor-
mation on creating a Ceph cluster). From the users’ point of view,
they have access to a large storage system and are not aware of
the underlying metadata servers, monitors, and individual object
storage devices that aggregate into a massive storage pool.
Users simply see a mount point, from which standard file I/O can
be performed.
The Ceph file system–or at least the client interface–is imple-
mented in the Linux kernel. Note that in the vast majority of
file systems, all of the control and intelligence is implemented
within the kernel’s file system source itself. But with Ceph, the
file system’s intelligence is distributed across the nodes, which
simplifies the client interface but also provides Ceph with the
ability to massively scale (even dynamically).
Rather than rely on allocation lists (metadata to map blocks on
a disk to a given file), Ceph uses an interesting alternative. A file
from the Linux perspective is assigned an inode number (INO)
from the metadata server, which is a unique identifier for the
file. The file is then carved into some number of objects (based
on the size of the file). Using the INO and the object number
(ONO), each object is assigned an object ID (OID). Using a simple
hash over the OID, each object is assigned to a placement group.
OpenStack for HPC
Cloud Environments
Object Storage with Ceph
29
The placement group (identified as a PGID) is a conceptual
container for objects. Finally, the mapping of the placement
group to object storage devices is a pseudo-random mapping
using an algorithm called Controlled Replication Under Scalable
Hashing (CRUSH). In this way, mapping of placement groups (and
replicas) to storage devices does not rely on any metadata but
instead on a pseudo-random mapping function. This behavior is
ideal, because it minimizes the overhead of storage and simpli-
fies the distribution and lookup of data.
The final component for allocation is the cluster map. The cluster
map is an efficient representation of the devices representing
the storage cluster. With a PGID and the cluster map, you can
locate any object.
The Ceph metadata server
The job of the metadata server (cmds) is to manage the file
system’s namespace. Although both metadata and data are
stored in the object storage cluster, they are managed separately
to support scalability. In fact, metadata is further split among
a cluster of metadata servers that can adaptively replicate and
distribute the namespace to avoid hot spots. As shown in Figure
4, the metadata servers manage portions of the namespace and
can overlap (for redundancy and also for performance). The
mapping of metadata servers to namespace is performed in
Ceph using dynamic subtree partitioning, which allows Ceph to
adapt to changing workloads (migrating namespaces between
metadata servers) while preserving locality for performance.
But because each metadata server simply manages the
namespace for the population of clients, its primary applica-
tion is an intelligent metadata cache (because actual metadata
is eventually stored within the object storage cluster). Metadata
to write is cached in a short-term journal, which eventually is
pushed to physical storage. This behavior allows the meta-
data server to serve recent metadata back to clients (which is
common in metadata operations). The journal is also useful for
failure recovery: if the metadata server fails, its journal can be
replayed to ensure that metadata is safely stored on disk.
Metadata servers manage the inode space, converting file names
to metadata. The metadata server transforms the file name into
an inode, file size, and striping data (layout) that the Ceph client
uses for file I/O.
Ceph monitors
Ceph includes monitors that implement management of the
cluster map, but some elements of fault management are imple-
mented in the object store itself. When object storage devices fail
or new devices are added, monitors detect and maintain a valid
cluster map. This function is performed in a distributed fashion
where map updates are communicated with existing traffic.
Ceph uses Paxos, which is a family of algorithms for distributed
consensus.
Ceph object storage
Similar to traditional object storage, Ceph storage nodes include
not only storage but also intelligence. Traditional drives are
simple targets that only respond to commands from initiators.
But object storage devices are intelligent devices that act as both
targets and initiators to support communication and collabora-
tion with other object storage devices.
MDS₀ MDS₁ MDS₂ MDS₃ MDS₄
Figure 4: Partitioning of the Ceph namespace for metadata servers
30
From a storage perspective, Ceph object storage devices perform
the mapping of objects to blocks (a task traditionally done at
the file system layer in the client). This behavior allows the local
entity to best decide how to store an object. Early versions of
Ceph implemented a custom low-level file system on the local
storage called EBOFS. This system implemented a nonstandard
interface to the underlying storage tuned for object seman-
tics and other features (such as asynchronous notification of
commits to disk). Today, the B-tree file system (BTRFS) can be
used at the storage nodes, which already implements some of
the necessary features (such as embedded integrity).
Because the Ceph clients implement CRUSH and do not have
knowledge of the block mapping of files on the disks, the under-
lying storage devices can safely manage the mapping of objects
to blocks. This allows the storage nodes to replicate data (when
a device is found to have failed). Distributing the failure recovery
also allows the storage system to scale, because failure detec-
tion and recovery are distributed across the ecosystem. Ceph
calls this RADOS (see Figure 3).
Other features of interest
As if the dynamic and adaptive nature of the file system weren’t
enough, Ceph also implements some interesting features visible
to the user. Users can create snapshots, for example, in Ceph on
any subdirectory (including all of the contents). It’s also possible
to perform file and capacity accounting at the subdirectory level,
which reports the storage size and number of files for a given
subdirectory (and all of its nested contents).
Ceph status and future
Although Ceph is now integrated into the mainline Linux kernel,
it’s properly noted there as experimental. File systems in this state
are useful to evaluate but are not yet ready for production envi-
ronments. But given Ceph’s adoption into the Linux kernel and
the motivation by its originators to continue its development,
it should be available soon to solve your massive storage needs.
OpenStack for HPC
Cloud Environments
Application Containers with Docker
31
Application Containers with Docker
Transforming Business Through Software
Gone are the days of private datacenters running off-the-shelf
software and giant monolithic code bases that you updated
once a year. Everything has changed. Whether it is moving to the
cloud, migrating between clouds, modernizing legacy or building
new apps and data structure, the desired results are always the
same – speed. The faster you can move defines your success as a
company.
Software is the critical IP that defines your company even
if the actual product you are selling may be a t-shirt, a car,
or compounding interest. Software is how you engage your
customers, reach new users, understand their data, promote
your product or service and process their order.
To do this well, today’s software is going bespoke. Small pieces
of software that are designed for a very specific job are called
microservices. The design goal of microservices is to have each
service built with all of the necessary components to “run” a
specific job with just the right type of underlying infrastructure
resources. Then, these services are loosely coupled together so
they can be changed at anytime, without having to worry about
the service that comes before or after it.
This methodology, while great for continuous improvement,
poses many challenges in reaching the end state. First it creates
a new, ever-expanding matrix of services, dependencies and
infrastructure making it difficult to manage. Additionally it does
not account for the vast amounts of existing legacy applications
in the landscape, the heterogeneity up and down the application
stack and the processes required to ensure this works in practice.
The Docker Journey and the Power of AND
In 2013, Docker entered the landscape with application
containers to build, ship and run applications anywhere. Docker
was able to take software and its dependencies package them up
into a lightweight container. Similar to how shipping containers
are today, software containers are simply a standard unit of soft-
ware that looks the same on the the outside regardless of what
code and dependencies are included on the inside. This enabled
developers and sysadmins to transport them across infrastruc-
tures and various environments without requiring any modi-
fications and regardless of the varying configurations in the
different environments. The Docker journey begins here.
Agility: The speed and simplicity of Docker was an
instant hit with developers and is what led to the
meteoric rise in the open source project. Developers
are now able to very simply package up any software and its
dependencies into a container. Developers are able to use any
language, version and tooling because they are all packaged in
a container, which “standardizes” all that heterogeneity without
sacrifice.
Portability: Just by the nature of the Docker tech-
nology, these very same developers have realized
their application containers are now portable – in
ways not previously possible. They can ship their applications
from development, to test and production and the code will work
as designed every time. Any differences in the environment did
not affect what was inside the container. Nor did they need to
change their application to work in production. This was also a
boon for IT operations teams as they can now move applications
across datacenters for clouds to avoid vendor lock in.
32
Control: As these applications move along the life-
cycle to production, new questions around security,
manageability and scale need to be answered. Docker
“standardizes” your environment while maintaining the hetero-
geneity your business requires. Docker provides the ability to set
the appropriate level of control and flexibility to maintain your
service levels, performance and regulatory compliance. IT oper-
ations teams can provision, secure, monitor and scale the infra-
structure and applications to maintain peak service levels. No
two applications or businesses are alike and the Docker allows
you to decide how to control your application environment.
At the core of the Docker journey is the power of AND. Docker
is the only solution to provide agility, portability and control for
developers and IT operations team across all stages of the appli-
cation lifecycle. From these core tenets, Containers as a Service
(CaaS) emerges as the construct by which these new applications
are built better and faster.
Docker Containers as a Service (Caas)
What is Containers as a Service (CaaS)? It is an IT managed and
secured application environment of infrastructure and content
where developers can in a self service manner, build and deploy
applications.
BUILD
Development
Environments
Secure Content 
Collaboration
Deploy, Manage,
Scale
SHIP RUN
Developers IT Operations
Diagram: CaaS Workflow
OpenStack for HPC
Cloud Environments
Application Containers with Docker
33
In the CaaS diagram above, development and IT operations
team collaborate through the registry. This is a service in which
a library of secure and signed images can be maintained. From
the registry, developers on the left are able to pull and build soft-
ware at their pace and then push the content back to the registry
once it passes integration testing to save the latest version.
Depending on the internal processes, the deployment step is
either automated with tools or can be manually deployed.
The IT operations team on the right in above diagram manages
the different vendor contracts for the production infrastruc-
ture such as compute, networking and storage. These teams
are responsible to provision the compute resources needed for
the application and use the Docker Universal Control Plane to
monitor the clusters and applications over time. Then, they can
move the apps from one cloud to another or scale up or down a
service to maintain peak performance.
Key Characteristics and Considerations
The Docker CaaS provides a framework for organizations to unify
the variety of systems, languages and tools in their environment
and apply the level of control, security or freedom required for
their business. As a Docker native solution with full support of
the Docker API, Docker CaaS can seamlessly take the application
from local development to production without changing the
code and streamlining the deployment cycle.
The following characteristics form the minimum requirements
for any organization’s application environment. In this paradigm,
development and IT operations teams are empowered to use the
best tools for their respective jobs without worrying of breaking
systems, each other’s workflows or lock-in.
2
3
The needs of developers and operations.
Many tools specifically address the functional needs of only
one team; however, CaaS breaks the cycle for continuous
improvement. To truly gain acceleration in the development
to production timeline, you need to address both users along
a continuum. Docker provides unique capabilities for each
team as well as a consistent API across the entire platform
for a seamless transition from one team to the next.
All stages in the application lifecycle.
From continuous integration to delivery and devops, these
practices are about eliminating the waterfall development
methodology and the lagging innovation cycles with it. By
providing tools for both the developer and IT operations,
Docker is able to seamlessly support an application from
build, test, stage to production.
Any language.
Developer agility means the freedom to build with whatever
language, version and tooling required for the features they
are building at that time. Also, the ability to run multiple
versions of a language at the same time provides a greater
level of flexibility. Docker allows your team to focus on
building the app instead of thinking of how to build an app
that works in Docker.
Any operating system.
The vast majority of organizations have more than one oper-
ating system. Some tools just work better in Linux while
others in Windows. Application platforms need to account
and support this diversity, otherwise they are solving only
part of the problem. Originally created for the Linux commu-
nity, Docker and Microsoft are bringing forward Windows
Server support to address the millions of enterprise applica-
tions in existence today and future applications.
1
4
34
Any infrastructure.
When it comes to infrastructure, organizations want
choice, backup and leverage. Whether that means you have
multiple private data centers, a hybrid cloud or multiple
cloud providers, the critical component is the ability to
move workloads from one environment to another, without
causing application issues. The Docker technology architec-
ture abstracts the infrastructure away from the application
allowing the application containers to be run anywhere and
portable across any other infrastructure.
Open APIs, pluggable architecture and ecosystem.
A platform isn’t really a platform if it is an island to itself.
Implementing new technologies is often not possible if you
need to re-tool your existing environment first. A funda-
mental guiding principle of Docker is a platform that is
open. Being open means APIs and plugins to make it easy for
you to leverage your existing investments and to fit Docker
into your environment and processes. This openness invites
a rich ecosystem to flourish and provide you with more
flexibility and choice in adding specialized capabilities to
your CaaS.
Although many, these characteristics are critical as the new
bespoke application paradigms only invite in greater hetero-
geneity into your technical architecture. The Docker CaaS plat-
form is fundamentally designed to support that diversity, while
providing the appropriate controls to manage at any scale.
5
6
OpenStack for HPC
Cloud Environments
Application Containers with Docker
35
Docker CaaS
The Docker CaaS platform is made possible by a suite of inte-
grated software solutions with a flexible deployment model to
meet the needs of your business.
 On-Premises/VPC: For organizations who need to keep their
IP within their network, Docker Trusted Registry and Docker
Universal Control Plane can be deployed on-premises or
in a VPC and connected to your existing infrastructure and
systems like storage, Active Directory/LDAP, monitoring and
logging solutions. Trusted Registry provides the ability to
store and manage images on your storage infrastructure
while also managing role based access control to the images.
Universal Control Plane provides visibility across your Docker
environment including Swarm clusters, Trusted Registry
repositories, containers and multi container applications.
 In the Cloud: For organizations who readily use SaaS solu-
tions, Docker Hub and Tutum by Docker provide a registry
service and control plane that is hosted and managed by
Docker. Hub is a cloud registry service to store and manage
your images and users permissions. Tutum by Docker provi-
sions and manages the deployment clusters as well as moni-
tors and manages the deployed applications. Connect to the
cloud infrastructure of your choice or bring your own phys-
ical node to deploy your application.
Your Docker CaaS can be designed to provide a centralized point
of control and management or allow for decentralized manage-
ment to empower individual application teams. The flexibility
allows you to create a model that is right for your business,
just like how you choose your infrastructure and implement
processes. CaaS is an extension of that to build, ship and run
applications.
Many IT initiatives are enabled and in fact, accelerated by CaaS
due to its unifying nature across the environment. Each organiza-
tion has their take on the terms for the initiatives but they range
from things like containerization, which may involve the “lift
and shift” of existing apps, or the adoption of microservices to
continuous integration, delivery and devops and various flavors
of the cloud including adoption, migration, hybrid and multiple.
In each scenario, Docker CaaS brings the agility, portability and
control to enable the adoption of those use cases across the
organization.
BUILD SHIP RUN
Docker Toolbox
Docker Hub
Docker Trusted Registry
Tutm by Docker
Docker Universal Control Plane
Diagram: CaaS Workflow
36
S
EW
SE
SW
tech BOX
Architecture of Docker
Docker is an open-source project that automates the deploy-
ment of applications inside software containers, by providing
an additional layer of abstraction and automation of operat-
ing-system-level virtualization on Linux. Docker uses the re-
source isolation features of the Linux kernel such as cgroups
and kernel namespaces, and a union-capable filesystem such as
AUFS and others to allow independent “containers” to run with-
in a single Linux instance, avoiding the overhead of starting and
maintaining virtual machines.
The Linux kernel’s support for namespaces mostly isolates an
application’s view of the operating environment, including pro-
cess trees, network, user IDs and mounted file systems, while
the kernel’s cgroups provide resource limiting, including the
CPU, memory, block I/O and network. Since version 0.9, Docker
includes the libcontainer library as its own way to directly use
virtualization facilities provided by the Linux kernel, in addi-
tion to using abstracted virtualization interfaces via libvirt, LXC
(Linux Containers) and systemd-nspawn.
As actions are done to a Docker base image, union filesystem lay-
ers are created and documented, such that each layer fully de-
scribes how to recreate an action. This strategy enables Docker’s
lightweight images, as only layer updates need to be propagated
(compared to full VMs, for example).
OpenStack for HPC
Cloud Environments
Application Containers with Docker
37
Overview
Docker implements a high-level API to provide lightweight
containers that run processes in isolation.
Building on top of facilities provided by the Linux kernel (primar-
ily cgroups and namespaces), a Docker container, unlike a vir-
tual machine, does not require or include a separate operating
system. Instead, it relies on the kernel’s functionality and uses
resource isolation (CPU, memory, block I/O, network, etc.) and
separate namespaces to isolate the application’s view of the
operating system. Docker accesses the Linux kernel’s virtualiza-
tion features either directly using the libcontainer library, which
is available as of Docker 0.9, or indirectly via libvirt, LXC (Linux
Containers) or systemd-nspawn.
By using containers, resources can be isolated, services restrict-
ed, and processes provisioned to have an almost completely
private view of the operating system with their own process ID
space, file system structure, and network interfaces. Multiple
containers share the same kernel, but each container can be con-
strained to only use a defined amount of resources such as CPU,
memory and I/O.
Using Docker to create and manage containers may simplify
the creation of highly distributed systems by allowing multiple
applications, worker tasks and other processes to run autono-
mously on a single physical machine or across multiple virtual
machines. This allows the deployment of nodes to be performed
as the resources become available or when more nodes are need-
ed, allowing a platform as a service (PaaS)-style of deployment
and scaling for systems like Apache Cassandra, MongoDB or Riak.
Docker also simplifies the creation and operation of task or work-
load queues and other distributed systems.
Integration
Docker can be integrated into various infrastructure tools, in-
cluding Amazon Web Services, Ansible, CFEngine, Chef, Google
Cloud Platform, IBM Bluemix, Jelastic, Jenkins,Microsoft Azure,
OpenStack Nova, OpenSVC, HPE Helion Stackato, Puppet, Salt,
Vagrant, and VMware vSphere Integrated Containers.
The Cloud Foundry Diego project integrates Docker into the
Cloud Foundry PaaS.
The GearD project aims to integrate Docker into the Red Hat’s
OpenShift Origin PaaS.
libcontainer
libvirt
systemd-
nspawnLXC
cgroups namespaces
SELinux
AppArmor
Netfilter
Netlink
capabilities
Docker
Linux kernel
Docker can use different interfaces to access virtualization
features of the Linux kernel.
39
Advanced Cluster
Management Made Easy
High Performance Computing (HPC) clusters serve a crucial role at research-intensive
organizations in industries such as aerospace, meteorology, pharmaceuticals, and oil
and gas exploration.
The thing they have in common is a requirement for large-scale computations. Bright’s
solution for HPC puts the power of supercomputing within the reach of just about any
organization.
Engineering | Life Sciences | Automotive | Price Modelling | Aerospace | CAE | Data Analytics
40
The Bright Advantage
Bright Cluster Manager for HPC makes it easy to build and oper-
ate HPC clusters using the server and networking equipment of
your choice, tying them together into a comprehensive, easy to
manage solution.
Bright Cluster Manager for HPC lets customers deploy complete
clusters over bare metal and manage them effectively. It pro-
vides single-pane-of-glass management for the hardware, the
operating system, HPC software, and users. With Bright Cluster
Manager for HPC, system administrators can get clusters up and
running quickly and keep them running reliably throughout their
life cycle – all with the ease and elegance of a fully featured, en-
terprise-grade cluster manager.
Bright Cluster Manager was designed by a group of highly expe-
rienced cluster specialists who perceived the need for a funda-
mental approach to the technical challenges posed by cluster
management. The result is a scalable and flexible architecture,
forming the basis for a cluster management solution that is ex-
tremely easy to install and use, yet is suitable for the largest and
most complex clusters.
Scale clusters to thousands of nodes
Bright Cluster Manager was designed to scale to thousands of
nodes. It is not dependent on third-party (open source) software
that was not designed for this level of scalability. Many advanced
features for handling scalability and complexity are included:
Management Daemon with Low CPU Overhead
The cluster management daemon (CMDaemon) runs on every
node in the cluster, yet has a very low CPU load – it will not cause
any noticeable slow-down of the applications running on your
cluster.
Because the CMDaemon provides all cluster management func-
tionality, the only additional daemon required is the workload
manager (or queuing system) daemon. For example, no daemons
The cluster installer takes you through the installation process and
offers advanced options such as “Express” and “Remote”.
Advanced Cluster
Management Made Easy
Easy-to-use, complete and scalable
41
for monitoring tools such as Ganglia or Nagios are required. This
minimizes the overall load of cluster management software on
your cluster.
Multiple Load-Balancing Provisioning Nodes
Node provisioning – the distribution of software from the head
node to the regular nodes – can be a significant bottleneck in
large clusters. With Bright Cluster Manager, provisioning capabil-
ity can be off-loaded to regular nodes, ensuring scalability to a
virtually unlimited number of nodes.
When a node requests its software image from the head node,
the head node checks which of the nodes with provisioning ca-
pability has the lowest load and instructs the node to download
or update its image from that provisioning node.
Synchronized Cluster Management Daemons
The CMDaemons are synchronized in time to execute tasks in
exact unison. This minimizes the effect of operating system jit-
ter (OS jitter) on parallel applications. OS jitter is the “noise” or
“jitter” caused by daemon processes and asynchronous events
such as interrupts. OS jitter cannot be totally prevented, but for
parallel applications it is important that any OS jitter occurs at
the same moment in time on every compute node.
Built-In Redundancy
Bright Cluster Manager supports redundant head nodes and re-
dundant provisioning nodes that can take over from each other
in case of failure. Both automatic and manual failover configura-
tions are supported.
Other redundancy features include support for hardware and
software RAID on all types of nodes in the cluster.
Diskless Nodes
Bright Cluster Manager supports diskless nodes, which can be
useful to increase overall Mean Time Between Failure (MTBF),
particularly on large clusters. Nodes with only InfiniBand and no
Ethernet connection are possible too.
Bright Cluster Manager 7 for HPC
For organizations that need to easily deploy and manage HPC
clusters Bright Cluster Manager for HPC lets customers deploy
complete clusters over bare metal and manage them effectively.
It provides single-pane-of-glass management for the hardware,
the operating system, the HPC software, and users.
With Bright Cluster Manager for HPC, system administrators can
quickly get clusters up and running and keep them running relia-
bly throughout their life cycle – all with the ease and elegance of
a fully featured, enterprise-grade cluster manager.
Features
 Simple Deployment Process – Just answer a few questions
about your cluster and Bright takes care of the rest. It installs
all of the software you need and creates the necessary config-
uration files for you, so you can relax and get your new cluster
up and running right – first time, every time.
S
EW
SE
SW
tech BOX
42
Head Nodes  Regular Nodes
A cluster can have different types of nodes, but it will always
have at least one head node and one or more “regular” nodes.
A regular node is a node which is controlled by the head node
and which receives its software image from the head node or
from a dedicated provisioning node.
Regular nodes come in different flavors.
Some examples include:
 Failover Node – A node that can take over all functionalities
of the head node when the head node becomes unavailable.
 Compute Node – A node which is primarily used for compu-
tations.
 Login Node – A node which provides login access to users of
the cluster. It is often also used for compilation and job sub-
mission.
 I/O Node – A node which provides access to disk storage.
 Provisioning Node – A node from which other regular nodes
can download their software image.
 Workload Management Node – A node that runs the central
workload manager, also known as queuing system.
 Subnet Management Node – A node that runs an InfiniBand
subnet manager.
 More types of nodes can easily be defined and configured
with Bright Cluster Manager. In simple clusters there is only
one type of regular node – the compute node – and the head
node fulfills all cluster management roles that are described
in the regular node types mentioned above.
Advanced Cluster
Management Made Easy
Easy-to-use, complete and scalable
43
All types of nodes (head nodes and regular nodes) run the same
CMDaemon. However, depending on the role that has been as-
signed to the node, the CMDaemon fulfills different tasks. This
node has a ‘Login Role’, ‘Storage Role’ and ‘PBS Client Role’.
The architecture of Bright Cluster Manager is implemented by
the following key elements:
 The Cluster Management Daemon (CMDaemon)
 The Cluster Management Shell (CMSH)
 The Cluster Management GUI (CMGUI)
Connection Diagram
A Bright cluster is managed by the CMDaemon on the head node,
which communicates with the CMDaemons on the other nodes
over encrypted connections.
The CMDaemon on the head node exposes an API which can also
be accessed from outside the cluster. This is used by the cluster
management Shell and the cluster management GUI, but it can
also be used by other applications if they are programmed to ac-
cess the API. The API documentation is available on request.
The diagram below shows a cluster with one head node, three
regular nodes and the CMDaemon running on each node.
Elements of the Architecture
44
 Can Install on Bare Metal – With Bright Cluster Manager there
is nothing to pre-install. You can start with bare metal serv-
ers, and we will install and configure everything you need. Go
from pallet to production in less time than ever.
 Comprehensive Metrics and Monitoring – Bright Cluster Man-
ager really shines when it comes to showing you what’s going
on in your cluster. It’s beautiful graphical user interface lets
you monitor resources and services across the entire cluster
in real time.
 Powerful User Interfaces – With Bright Cluster Manager you
don’t just get one great user interface, we give you two. That
way, you can choose to provision, monitor, and manage your
HPC clusters with a traditional command line interface or our
beautiful, intuitive graphical user interface.
 OptimizetheUseofITResourcesbyHPCApplications–Bright
ensures HPC applications get the resources they require
according to the policies of your organization. Your cluster
will prioritize workloads according to your business priorities.
 HPCToolsandLibrariesIncluded–EverycopyofBrightCluster
Manager comes with a complete set of HPC tools and libraries
so you will be ready to develop, debug, and deploy your HPC
code right away.
“The building blocks for transtec HPC solutions must be chosen
according to our goals ease-of-management and ease-of-use.
With Bright Cluster Manager, we are happy to have the technology
leader at hand, meeting these requirements, and our customers
value that.”
Jörg Walter | HPC Solution Engineer
Monitor your entire cluster at an glance with Bright Cluster Manager
Advanced Cluster
Management Made Easy
Easy-to-use, complete and scalable
45
Quickly build and deploy an HPC cluster
When you need to get an HPC cluster up and running, don’t waste
time cobbling together a solution from various open source
tools. Bright’s solution comes with everything you need to set up
a complete cluster from bare metal and manage it with a beauti-
ful, powerful user interface.
Whether your cluster is on-premise or in the cloud, Bright is a
virtual one-stop-shop for your HPC project.
Build a cluster in the cloud
Bright’s dynamic cloud provisioning capability lets you build an
entire cluster in the cloud or expand your physical cluster into
the cloud for extra capacity. Bright allocates the compute re-
sources in Amazon Web Services automatically, and on demand.
Build a hybrid HPC/Hadoop cluster
Does your organization need HPC clusters for technical comput-
ing and Hadoop clusters for Big Data? The Bright Cluster Man-
ager for Apache Hadoop add-on enables you to easily build and
manage both types of clusters from a single pane of glass. Your
system administrators can monitor all cluster operations, man-
age users, and repurpose servers with ease.
Driven by research
Bright has been building enterprise-grade cluster management
software for more than a decade. Our solutions are deployed in
thousands of locations around the globe. Why reinvent the
wheel when you can rely on our experts to help you implement
the ideal cluster management solution for your organization.
Clusters in the cloud
Bright Cluster Manager can provision and manage clusters that
are running on virtual servers inside the AWS cloud as if they
were local machines. Use this feature to build an entire cluster
in AWS from scratch or extend a physical cluster into the cloud
when you need extra capacity.
Use native services for storage in AWS – Bright Cluster Manager
lets you take advantage of inexpensive, secure, durable, flexible
and simple storage services for data use, archiving, and backup
in the AWS cloud.
Make smart use of AWS for HPC – Bright Cluster Manager can
save you money by instantiating compute resources in AWS only
when they are needed. It uses built-in intelligence to create in-
stances only after the data is ready for processing and the back-
log in on-site workloads requires it.
You decide which metrics to monitor. Just drag a component into the display
area an Bright Cluster Manager instantly creates a graph of the data you need.
BOX
46
High performance meets efficiency
Initially, massively parallel systems constitute a challenge to
both administrators and users. They are complex beasts. Anyone
building HPC clusters will need to tame the beast, master the
complexity and present users and administrators with an easy-
to-use, easy-to-manage system landscape.
Leading HPC solution providers such as transtec achieve this
goal. They hide the complexity of HPC under the hood and match
high performance with efficiency and ease-of-use for both users
and administrators. The “P” in “HPC” gains a double meaning:
“Performance” plus “Productivity”.
Cluster and workload management software like Moab HPC
Suite, Bright Cluster Manager or Intel Fabric Suite provide the
means to master and hide the inherent complexity of HPC sys-
tems. For administrators and users, HPC clusters are presented
as single, large machines, with many different tuning parame-
ters. The software also provides a unified view of existing clus-
ters whenever unified management is added as a requirement by
the customer at any point in time after the first installation. Thus,
daily routine tasks such as job management, user management,
queue partitioning and management, can be performed easily
with either graphical or web-based tools, without any advanced
scripting skills or technical expertise required from the adminis-
trator or user.
Advanced Cluster
Management Made Easy
Easy-to-use, complete and scalable
S
EW
SE
SW tech BOX
47
Bright Cluster Manager unleashes and manages
the unlimited power of the cloud
Create a complete cluster in Amazon EC2, or easily extend your
onsite cluster into the cloud. Bright Cluster Manager provides a
“single pane of glass” to both your on-premise and cloud resourc-
es, enabling you to dynamically add capacity and manage cloud
nodes as part of your onsite cluster.
Cloud utilization can be achieved in just a few mouse clicks, with-
out the need for expert knowledge of Linux or cloud computing.
With Bright Cluster Manager, every cluster is cloud-ready, at no
extra cost. The same powerful cluster provisioning, monitoring,
scheduling and management capabilities that Bright Cluster
Manager provides to onsite clusters extend into the cloud.
The Bright advantage for cloud utilization
 Ease of use: Intuitive GUI virtually eliminates user learning
curve; no need to understand Linux or EC2 to manage system.
Alternatively, cluster management shell provides powerful
scripting capabilities to automate tasks.
 Complete management solution: Installation/initialization,
provisioning, monitoring, scheduling and management in
one integrated environment.
 Integrated workload management: Wide selection of work-
load managers included and automatically configured with
local, cloud and mixed queues.
 Single pane of glass; complete visibility and control: Cloud
computenodesmanagedaselementsoftheon-sitecluster;vi-
sible from a single console with drill-downs and historic data.
 Efficient data management via data-aware scheduling: Auto-
matically ensures data is in position at start of computation;
delivers results back when complete.
 Secure, automatic gateway: Mirrored LDAP and DNS services
inside Amazon VPC, connecting local and cloud-based nodes
for secure communication over VPN.
 Cost savings: More efficient use of cloud resources; support
for spot instances, minimal user intervention.
Two cloud scenarios
Bright Cluster Manager supports two cloud utilization scenarios:
“Cluster-on-Demand” and “Cluster Extension”.
Scenario 1: Cluster-on-Demand
The Cluster-on-Demand scenario is ideal if you do not have a clus-
ter onsite, or need to set up a totally separate cluster. With just a
few mouse clicks you can instantly create a complete cluster in
the public cloud, for any duration of time.
48
Scenario 2: Cluster Extension
The Cluster Extension scenario is ideal if you have a cluster onsite
but you need more compute power, including GPUs. With Bright
Cluster Manager, you can instantly add EC2-based resources to
your onsite cluster, for any duration of time. Extending into the
cloud is as easy as adding nodes to an onsite cluster – there are
only a few additional, one-time steps after providing the public
cloud account information to Bright Cluster Manager.
The Bright approach to managing and monitoring a cluster in
the cloud provides complete uniformity, as cloud nodes are
managed and monitored the same way as local nodes:
 Load-balanced provisioning
 Software image management
 Integrated workload management
 Interfaces – GUI, shell, user portal
 Monitoring and health checking
 Compilers, debuggers, MPI libraries, mathematical libraries
and environment modules
Advanced Cluster
Management Made Easy
Cloud Bursting With Bright
Bright Computing
49
Bright Cluster Manager also provides additional features that are
unique to the cloud.
Amazon spot instance support – Bright Cluster Manager enables
users to take advantage of the cost savings offered by Amazon’s
spot instances. Users can specify the use of spot instances, and
Bright will automatically schedule as available, reducing the cost
to compute without the need to monitor spot prices and sched-
ule manually.
Hardware Virtual Machine (HVM) – Bright Cluster Manager auto-
matically initializes all Amazon instance types, including Cluster
Compute and Cluster GPU instances that rely on HVM virtualization.
Amazon VPC – Bright supports Amazon VPC setups which allows
compute nodes in EC2 to be placed in an isolated network, there-
by separating them from the outside world. It is even possible to
route part of a local corporate IP network to a VPC subnet in EC2,
so that local nodes and nodes in EC2 can communicate easily.
Data-aware scheduling – Data-aware scheduling ensures that
input data is automatically transferred to the cloud and made
accessible just prior to the job starting, and that the output data
is automatically transferred back. There is no need to wait for the
data to load prior to submitting jobs (delaying entry into the job
queue), nor any risk of starting the job before the data transfer is
complete (crashing the job). Users submit their jobs, and Bright’s
data-aware scheduling does the rest.
Bright Cluster Manager provides the choice:
“To Cloud, or Not to Cloud”
Not all workloads are suitable for the cloud. The ideal situation
for most organizations is to have the ability to choose between
onsite clusters for jobs that require low latency communication,
complex I/O or sensitive data; and cloud clusters for many other
types of jobs.
Bright Cluster Manager delivers the best of both worlds: a pow-
erful management solution for local and cloud clusters, with
the ability to easily extend local clusters into the cloud without
compromising provisioning, monitoring or managing the cloud
resources.
One or more cloud nodes can be configured under the Cloud Settings tab.
51
Intelligent HPC
Workload Management
While all HPC systems face challenges in workload demand, workflow constraints,
resource complexity, and system scale, enterprise HPC systems face more stringent
challenges and expectations.
Enterprise HPC systems must meet mission-critical and priority HPC workload demands
for commercial businesses, business-oriented research, and academic organizations.
High Throuhgput Computing | CAD | Big Data Analytics | Simulation | Aerospace | Automotive
52
Solutions to Enterprise
High-Performance Computing Challenges
The Enterprise has requirements for dynamic scheduling, provi-
sioning and management of multi-step/multi-application servic-
es across HPC, cloud, and big data environments. Their workloads
directly impact revenue, product delivery, and organizational ob-
jectives.
Enterprise HPC systems must process intensive simulation and
data analysis more rapidly, accurately and cost-effectively to
accelerate insights. By improving workflow and eliminating job
delays and failures, the business can more efficiently leverage
data to make data-driven decisions and achieve a competitive
advantage. The Enterprise is also seeking to improve resource
utilization and manage efficiency across multiple heterogene-
ous systems. User productivity must increase to speed the time
to discovery, making it easier to access and use HPC resources
and expand to other resources as workloads demand.
Moab HPC Suite
Moab HPC Suite accelerates insights by unifying data center re-
sources, optimizing the analysis process, and guaranteeing ser-
vices to the business. Moab 8.1 for HPC systems and HPC cloud
continually meets enterprise priorities through increased pro-
ductivity, automated workload uptime, and consistent SLAs. It
uses the battle-tested and patented Moab intelligence engine
to automatically balance the complex, mission-critical workload
priorities of enterprise HPC systems. Enterprise customers ben-
efit from a single integrated product that brings together key
Enterprise HPC capabilities, implementation services, and 24/7
support.
It is imperative to automate workload workflows to speed time
to discovery. The following new use cases play a key role in im-
proving overall system performance:
Moab is the “brain” of an HPC system, intelligently optimizing workload
throughput while balancing service levels and priorities.
Intelligent HPC
Workload Management
Moab HPC Suite
53
 Elastic Computing – unifies data center resources by assisting
the business management resource expansion through burst-
ing to private/public clouds and other data center resources
utilizing OpenStack as a common platform
 Performance and Scale – optimizes the analysis process by
dramatically increasing TORQUE and Moab throughput and
scalability across the board
 Tighter cooperation between Moab and Torque – harmoniz-
ing these key structures to reduce overhead and improve
communication between the HPC scheduler and the re-
source manager(s)
 More Parallelization – increasing the decoupling of Moab
and TORQUE’s network communication so Moab is less
dependent on TORQUE’S responsiveness, resulting in a 2x
speed improvement and shortening the duration of the av-
erage scheduling iteration in half
 Accounting Improvements – allowing toggling between
multiple modes of accounting with varying levels of en-
forcement, providing greater flexibility beyond strict allo-
cation options. Additionally, new up-to-date accounting
balances provide real-time insights into usage tracking.
 Viewpoint Admin Portal – guarantees services to the busi-
ness by allowing admins to simplify administrative reporting,
workload status tracking, and job resource viewing
Accelerate Insights
Moab HPC Suite accelerates insights by increasing overall
system, user and administrator productivity, achieving more
accurate results that are delivered faster and at a lower cost
from HPC resources. Moab provides the scalability, 90-99 per-
cent utilization, and simple job submission required to maximize
productivity. This ultimately speeds the time to discovery, aiding
the enterprise in achieving a competitive advantage due to its
HPC system. Enterprise use cases and capabilities include the
following:
 OpenStack integration to offer virtual and physical resource
provisioning for IaaS and PaaS
 Performance boost to achieve 3x improvement in overall opti-
mization performance
 Advanced workflow data staging to enable improved cluster
utilization, multiple transfer methods, and new transfer types
54
“With Moab HPC Suite, we can meet very demanding customers’
requirements as regards unified management of heterogeneous
cluster environments, grid management, and provide them with
flexible and powerful configuration and reporting options. Our
customers value that highly.”
Sven Grützmacher | HPC Solution Engineer
 Advanced power management with clock frequency control
and additional power state options, reducing energy costs by
15-30 percent
 Workload-optimized allocation policies and provisioning to
get more results out of existing heterogeneous resources and
reduce costs, including topology-based allocation
 Unify workload management across heterogeneous clusters
by managing them as one cluster, maximizing resource availa-
bility and administration efficiency
 Optimized, intelligent scheduling packs workloads and back-
fills around priority jobs and reservations while balancing
SLAs to efficiently use all available resources
 Optimized scheduling and management of accelerators, both
Intel Xeon Phi and GPGPUs, for jobs to maximize their utiliza-
tion and effectiveness
 Simplified job submission and management with advanced
job arrays and templates
 Showback or chargeback for pay-for-use so actual resource
usage is tracked with flexible chargeback rates and reporting
by user, department, cost center, or cluster
 Multi-cluster grid capabilities manage and share workload
across multiple remote clusters to meet growing workload
demand or surges
Guarantee Services to the Business
Job and resource failures in enterprise HPC systems lead to de-
layed results, missed organizational opportunities, and missed
objectives. Moab HPC Suite intelligently automates workload
and resource uptime in the HPC system to ensure that workloads
complete successfully and reliably, avoid failures, and guarantee
services are delivered to the business.
Intelligent HPC
Workload Management
Moab HPC Suite
55
The Enterprise benefits from these features:
 Intelligent resource placement prevents job failures with
granular resource modeling, meeting workload requirements
and avoiding at-risk resources
 Auto-response to failures and events with configurable
actions to pre-failure conditions, amber alerts, or other
metrics and monitors
 Workload-aware future maintenance scheduling that helps
maintain a stable HPC system without disrupting workload
productivity
 Real-world expertise for fast time-to-value and system uptime
with included implementation, training and 24/7 support
remote services
Auto SLA Enforcement
Moab HPC Suite uses the powerful Moab intelligence engine to
optimally schedule and dynamically adjust workload to consist-
ently meet service level agreements (SLAs), guarantees, or busi-
ness priorities. This automatically ensures that the right work-
loads are completed at the optimal times, taking into account
the complex number of using departments, priorities, and SLAs
to be balanced. Moab provides the following benefits:
 Usage accounting and budget enforcement that schedules
resources and reports on usage in line with resource sharing
agreements and precise budgets (includes usage limits, usage
reports, auto budget management, and dynamic fair share
policies)
 SLA and priority policies that make sure the highest priority
workloads are processed first (includes Quality of Service and
hierarchical priority weighting)
 Continuous plus future scheduling that ensures priorities and
guarantees are proactively met as conditions and workload
changes (i.e. future reservations, pre-emption)
BOX
transtec HPC solutions are designed for maximum flexibility
and ease of management. We not only offer our customers
the most powerful and flexible cluster management solu-
tion out there, but also provide them with customized setup
and sitespecific configuration. Whether a customer needs
a dynamical Linux-Windows dual-boot solution, unified
management of different clusters at different sites, or the
fine-tuning of the Moab scheduler for implementing fine-
grained policy confi guration – transtec not only gives you
the framework at hand, but also helps you adapt the system
according to your special needs. Needless to say, when cus-
tomers are in need of special trainings, transtec will be there
to provide customers, administrators, or users with specially
adjusted Educational Services.
Having a many years’ experience in High Performance
Computing enabled us to develop efficient concepts for in-
stalling and deploying HPC clusters. For this, we leverage
well-proven 3rd party tools, which we assemble to a total
solution and adapt and confi gure according to the custom-
er’s requirements.
We manage everything from the complete installation and
configuration of the operating system, necessary middle-
ware components like job and cluster management systems
up to the customer’s applications.
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17
HPC Compass 2016_17

More Related Content

What's hot

OCP Summit 2017
OCP Summit 2017OCP Summit 2017
OCP Summit 2017
Jaroslaw Sobel
 
Alluxio - Scalable Filesystem Metadata Services
Alluxio - Scalable Filesystem Metadata ServicesAlluxio - Scalable Filesystem Metadata Services
Alluxio - Scalable Filesystem Metadata Services
Alluxio, Inc.
 
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamComputeIn-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
Patrick McGarry
 
Red Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and FutureRed Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and Future
Red_Hat_Storage
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
inside-BigData.com
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
Simplifying the Move to OpenStack
Simplifying the Move to OpenStackSimplifying the Move to OpenStack
Simplifying the Move to OpenStack
OpenStack
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloud
Karan Singh
 
Ceph, Open Source, and the Path to Ubiquity in Storage - AACS Meetup 2014
Ceph, Open Source, and the Path to Ubiquity in Storage - AACS Meetup 2014Ceph, Open Source, and the Path to Ubiquity in Storage - AACS Meetup 2014
Ceph, Open Source, and the Path to Ubiquity in Storage - AACS Meetup 2014
Patrick McGarry
 
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and IgniteJCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
Joseph Kuo
 
2014 Ceph NYLUG Talk
2014 Ceph NYLUG Talk2014 Ceph NYLUG Talk
2014 Ceph NYLUG Talk
Patrick McGarry
 
Reducing large S3 API costs using Alluxio at Datasapiens
Reducing large S3 API costs using Alluxio at Datasapiens Reducing large S3 API costs using Alluxio at Datasapiens
Reducing large S3 API costs using Alluxio at Datasapiens
Alluxio, Inc.
 
What we Learned About Application Resiliency When the Data Center Burned Down
What we Learned About Application Resiliency When the Data Center Burned DownWhat we Learned About Application Resiliency When the Data Center Burned Down
What we Learned About Application Resiliency When the Data Center Burned Down
ScyllaDB
 
Distributed storage system
Distributed storage systemDistributed storage system
Distributed storage system
Công Lợi Dương
 
Running Distributed TensorFlow with GPUs on Mesos with DC/OS
Running Distributed TensorFlow with GPUs on Mesos with DC/OS Running Distributed TensorFlow with GPUs on Mesos with DC/OS
Running Distributed TensorFlow with GPUs on Mesos with DC/OS
Mesosphere Inc.
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Alluxio, Inc.
 
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud EraModernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Alluxio, Inc.
 
Survey of distributed storage system
Survey of distributed storage systemSurvey of distributed storage system
Survey of distributed storage system
Zhichao Liang
 
Accelerating Data Computation on Ceph Objects
Accelerating Data Computation on Ceph ObjectsAccelerating Data Computation on Ceph Objects
Accelerating Data Computation on Ceph Objects
Alluxio, Inc.
 
Integration of Storage, OpenStack & Virtualization
 Integration of Storage, OpenStack & Virtualization Integration of Storage, OpenStack & Virtualization
Integration of Storage, OpenStack & Virtualization
Sean Cohen
 

What's hot (20)

OCP Summit 2017
OCP Summit 2017OCP Summit 2017
OCP Summit 2017
 
Alluxio - Scalable Filesystem Metadata Services
Alluxio - Scalable Filesystem Metadata ServicesAlluxio - Scalable Filesystem Metadata Services
Alluxio - Scalable Filesystem Metadata Services
 
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamComputeIn-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
 
Red Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and FutureRed Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and Future
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Simplifying the Move to OpenStack
Simplifying the Move to OpenStackSimplifying the Move to OpenStack
Simplifying the Move to OpenStack
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloud
 
Ceph, Open Source, and the Path to Ubiquity in Storage - AACS Meetup 2014
Ceph, Open Source, and the Path to Ubiquity in Storage - AACS Meetup 2014Ceph, Open Source, and the Path to Ubiquity in Storage - AACS Meetup 2014
Ceph, Open Source, and the Path to Ubiquity in Storage - AACS Meetup 2014
 
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and IgniteJCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite
 
2014 Ceph NYLUG Talk
2014 Ceph NYLUG Talk2014 Ceph NYLUG Talk
2014 Ceph NYLUG Talk
 
Reducing large S3 API costs using Alluxio at Datasapiens
Reducing large S3 API costs using Alluxio at Datasapiens Reducing large S3 API costs using Alluxio at Datasapiens
Reducing large S3 API costs using Alluxio at Datasapiens
 
What we Learned About Application Resiliency When the Data Center Burned Down
What we Learned About Application Resiliency When the Data Center Burned DownWhat we Learned About Application Resiliency When the Data Center Burned Down
What we Learned About Application Resiliency When the Data Center Burned Down
 
Distributed storage system
Distributed storage systemDistributed storage system
Distributed storage system
 
Running Distributed TensorFlow with GPUs on Mesos with DC/OS
Running Distributed TensorFlow with GPUs on Mesos with DC/OS Running Distributed TensorFlow with GPUs on Mesos with DC/OS
Running Distributed TensorFlow with GPUs on Mesos with DC/OS
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
 
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud EraModernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
 
Survey of distributed storage system
Survey of distributed storage systemSurvey of distributed storage system
Survey of distributed storage system
 
Accelerating Data Computation on Ceph Objects
Accelerating Data Computation on Ceph ObjectsAccelerating Data Computation on Ceph Objects
Accelerating Data Computation on Ceph Objects
 
Integration of Storage, OpenStack & Virtualization
 Integration of Storage, OpenStack & Virtualization Integration of Storage, OpenStack & Virtualization
Integration of Storage, OpenStack & Virtualization
 

Viewers also liked

Hpc to OpenStack: Our journey
Hpc to OpenStack: Our journeyHpc to OpenStack: Our journey
Hpc to OpenStack: Our journey
Arif Ali
 
Red Hat Storage Day LA - Supermicro SuperStorage
Red Hat Storage Day LA - Supermicro SuperStorageRed Hat Storage Day LA - Supermicro SuperStorage
Red Hat Storage Day LA - Supermicro SuperStorage
Red_Hat_Storage
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference Architectures
Red_Hat_Storage
 
Red Hat Storage Day New York - QCT: Avoid the mess, deploy with a validated s...
Red Hat Storage Day New York - QCT: Avoid the mess, deploy with a validated s...Red Hat Storage Day New York - QCT: Avoid the mess, deploy with a validated s...
Red Hat Storage Day New York - QCT: Avoid the mess, deploy with a validated s...
Red_Hat_Storage
 
Red hat NFV Roadmap - OpenStack Summit 2016/Red Hat NFV Mini Summit
Red hat NFV Roadmap    - OpenStack Summit 2016/Red Hat NFV Mini SummitRed hat NFV Roadmap    - OpenStack Summit 2016/Red Hat NFV Mini Summit
Red hat NFV Roadmap - OpenStack Summit 2016/Red Hat NFV Mini Summit
kimw001
 
Red Hat Storage Day New York - Penguin Computing Spotlight: Delivering Open S...
Red Hat Storage Day New York - Penguin Computing Spotlight: Delivering Open S...Red Hat Storage Day New York - Penguin Computing Spotlight: Delivering Open S...
Red Hat Storage Day New York - Penguin Computing Spotlight: Delivering Open S...
Red_Hat_Storage
 
State of Linux Containers for HPC
State of Linux Containers for HPCState of Linux Containers for HPC
State of Linux Containers for HPC
inside-BigData.com
 

Viewers also liked (8)

Hpc to OpenStack: Our journey
Hpc to OpenStack: Our journeyHpc to OpenStack: Our journey
Hpc to OpenStack: Our journey
 
Red Hat Storage Day LA - Supermicro SuperStorage
Red Hat Storage Day LA - Supermicro SuperStorageRed Hat Storage Day LA - Supermicro SuperStorage
Red Hat Storage Day LA - Supermicro SuperStorage
 
Tuned
TunedTuned
Tuned
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference Architectures
 
Red Hat Storage Day New York - QCT: Avoid the mess, deploy with a validated s...
Red Hat Storage Day New York - QCT: Avoid the mess, deploy with a validated s...Red Hat Storage Day New York - QCT: Avoid the mess, deploy with a validated s...
Red Hat Storage Day New York - QCT: Avoid the mess, deploy with a validated s...
 
Red hat NFV Roadmap - OpenStack Summit 2016/Red Hat NFV Mini Summit
Red hat NFV Roadmap    - OpenStack Summit 2016/Red Hat NFV Mini SummitRed hat NFV Roadmap    - OpenStack Summit 2016/Red Hat NFV Mini Summit
Red hat NFV Roadmap - OpenStack Summit 2016/Red Hat NFV Mini Summit
 
Red Hat Storage Day New York - Penguin Computing Spotlight: Delivering Open S...
Red Hat Storage Day New York - Penguin Computing Spotlight: Delivering Open S...Red Hat Storage Day New York - Penguin Computing Spotlight: Delivering Open S...
Red Hat Storage Day New York - Penguin Computing Spotlight: Delivering Open S...
 
State of Linux Containers for HPC
State of Linux Containers for HPCState of Linux Containers for HPC
State of Linux Containers for HPC
 

Similar to HPC Compass 2016_17

Hpc kompass 2015
Hpc kompass 2015Hpc kompass 2015
Hpc kompass 2015
TTEC
 
HPC Technology Compass 2014/15
HPC Technology Compass 2014/15HPC Technology Compass 2014/15
HPC Technology Compass 2014/15
Marco van der Hart
 
Hpc compass 2013-final_web
Hpc compass 2013-final_webHpc compass 2013-final_web
Hpc compass 2013-final_web
Marco van der Hart
 
HPC compass 2013/2014
HPC compass 2013/2014HPC compass 2013/2014
HPC compass 2013/2014
TTEC
 
HPC kompass ibm_special_2013/2014
HPC kompass ibm_special_2013/2014HPC kompass ibm_special_2013/2014
HPC kompass ibm_special_2013/2014
TTEC
 
HPC Compass IBM Special 2013/14
HPC Compass IBM Special 2013/14HPC Compass IBM Special 2013/14
HPC Compass IBM Special 2013/14
Marco van der Hart
 
Ibm special hpc
Ibm special hpcIbm special hpc
Ibm special hpc
TTEC
 
OpenPOWER's ISC 2016 Recap
OpenPOWER's ISC 2016 RecapOpenPOWER's ISC 2016 Recap
OpenPOWER's ISC 2016 Recap
OpenPOWERorg
 
En rh - cito - research-why-you-should-put-red-hat-under-your-sap-systems whi...
En rh - cito - research-why-you-should-put-red-hat-under-your-sap-systems whi...En rh - cito - research-why-you-should-put-red-hat-under-your-sap-systems whi...
En rh - cito - research-why-you-should-put-red-hat-under-your-sap-systems whi...
CMR WORLD TECH
 
HPC Compass 2014/2015 IBM Special
HPC Compass 2014/2015 IBM SpecialHPC Compass 2014/2015 IBM Special
HPC Compass 2014/2015 IBM Special
Marco van der Hart
 
Hpc kompass ibm_special_2014
Hpc kompass ibm_special_2014Hpc kompass ibm_special_2014
Hpc kompass ibm_special_2014
TTEC
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
Ioan Toma
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
Nous Infosystems
 
ICEOTOPE & OCF: Performance for Manufacturing
ICEOTOPE & OCF: Performance for Manufacturing ICEOTOPE & OCF: Performance for Manufacturing
ICEOTOPE & OCF: Performance for Manufacturing
IceotopePR
 
HIPAS UCP HSP Openstack Sascha Oehl
HIPAS UCP HSP Openstack Sascha OehlHIPAS UCP HSP Openstack Sascha Oehl
HIPAS UCP HSP Openstack Sascha Oehl
Sascha Oehl
 
HPE’s Erik Vogel on Key Factors for Driving Success in Hybrid Cloud Adoption ...
HPE’s Erik Vogel on Key Factors for Driving Success in Hybrid Cloud Adoption ...HPE’s Erik Vogel on Key Factors for Driving Success in Hybrid Cloud Adoption ...
HPE’s Erik Vogel on Key Factors for Driving Success in Hybrid Cloud Adoption ...
Dana Gardner
 
ISC 2016 Day 2 Recap
ISC 2016 Day 2 RecapISC 2016 Day 2 Recap
ISC 2016 Day 2 Recap
OpenPOWERorg
 
ManMachine&Mathematics_Arup_Ray_Ext
ManMachine&Mathematics_Arup_Ray_ExtManMachine&Mathematics_Arup_Ray_Ext
ManMachine&Mathematics_Arup_Ray_ExtArup Ray
 
EMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road AheadEMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road Ahead
inside-BigData.com
 
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsDelivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Hortonworks
 

Similar to HPC Compass 2016_17 (20)

Hpc kompass 2015
Hpc kompass 2015Hpc kompass 2015
Hpc kompass 2015
 
HPC Technology Compass 2014/15
HPC Technology Compass 2014/15HPC Technology Compass 2014/15
HPC Technology Compass 2014/15
 
Hpc compass 2013-final_web
Hpc compass 2013-final_webHpc compass 2013-final_web
Hpc compass 2013-final_web
 
HPC compass 2013/2014
HPC compass 2013/2014HPC compass 2013/2014
HPC compass 2013/2014
 
HPC kompass ibm_special_2013/2014
HPC kompass ibm_special_2013/2014HPC kompass ibm_special_2013/2014
HPC kompass ibm_special_2013/2014
 
HPC Compass IBM Special 2013/14
HPC Compass IBM Special 2013/14HPC Compass IBM Special 2013/14
HPC Compass IBM Special 2013/14
 
Ibm special hpc
Ibm special hpcIbm special hpc
Ibm special hpc
 
OpenPOWER's ISC 2016 Recap
OpenPOWER's ISC 2016 RecapOpenPOWER's ISC 2016 Recap
OpenPOWER's ISC 2016 Recap
 
En rh - cito - research-why-you-should-put-red-hat-under-your-sap-systems whi...
En rh - cito - research-why-you-should-put-red-hat-under-your-sap-systems whi...En rh - cito - research-why-you-should-put-red-hat-under-your-sap-systems whi...
En rh - cito - research-why-you-should-put-red-hat-under-your-sap-systems whi...
 
HPC Compass 2014/2015 IBM Special
HPC Compass 2014/2015 IBM SpecialHPC Compass 2014/2015 IBM Special
HPC Compass 2014/2015 IBM Special
 
Hpc kompass ibm_special_2014
Hpc kompass ibm_special_2014Hpc kompass ibm_special_2014
Hpc kompass ibm_special_2014
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
 
ICEOTOPE & OCF: Performance for Manufacturing
ICEOTOPE & OCF: Performance for Manufacturing ICEOTOPE & OCF: Performance for Manufacturing
ICEOTOPE & OCF: Performance for Manufacturing
 
HIPAS UCP HSP Openstack Sascha Oehl
HIPAS UCP HSP Openstack Sascha OehlHIPAS UCP HSP Openstack Sascha Oehl
HIPAS UCP HSP Openstack Sascha Oehl
 
HPE’s Erik Vogel on Key Factors for Driving Success in Hybrid Cloud Adoption ...
HPE’s Erik Vogel on Key Factors for Driving Success in Hybrid Cloud Adoption ...HPE’s Erik Vogel on Key Factors for Driving Success in Hybrid Cloud Adoption ...
HPE’s Erik Vogel on Key Factors for Driving Success in Hybrid Cloud Adoption ...
 
ISC 2016 Day 2 Recap
ISC 2016 Day 2 RecapISC 2016 Day 2 Recap
ISC 2016 Day 2 Recap
 
ManMachine&Mathematics_Arup_Ray_Ext
ManMachine&Mathematics_Arup_Ray_ExtManMachine&Mathematics_Arup_Ray_Ext
ManMachine&Mathematics_Arup_Ray_Ext
 
EMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road AheadEMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road Ahead
 
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsDelivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
 

More from Marco van der Hart

Chip ict informs lantronix slc 8000 application cable and video streaming
Chip ict informs lantronix slc 8000 application cable and video streamingChip ict informs lantronix slc 8000 application cable and video streaming
Chip ict informs lantronix slc 8000 application cable and video streaming
Marco van der Hart
 
Chip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochureChip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochure
Marco van der Hart
 
ttec / transtec | IBM NeXtScale
ttec / transtec | IBM NeXtScale ttec / transtec | IBM NeXtScale
ttec / transtec | IBM NeXtScale
Marco van der Hart
 
ttec | Microsoft Windows Server 2012
ttec | Microsoft Windows Server 2012ttec | Microsoft Windows Server 2012
ttec | Microsoft Windows Server 2012
Marco van der Hart
 
8-way-server
8-way-server8-way-server
8-way-server
Marco van der Hart
 
ttec vSphere 5
ttec vSphere 5ttec vSphere 5
ttec vSphere 5
Marco van der Hart
 
ttec - ParStream
ttec - ParStreamttec - ParStream
ttec - ParStream
Marco van der Hart
 
Backup for dummies
Backup for dummiesBackup for dummies
Backup for dummies
Marco van der Hart
 

More from Marco van der Hart (8)

Chip ict informs lantronix slc 8000 application cable and video streaming
Chip ict informs lantronix slc 8000 application cable and video streamingChip ict informs lantronix slc 8000 application cable and video streaming
Chip ict informs lantronix slc 8000 application cable and video streaming
 
Chip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochureChip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochure
 
ttec / transtec | IBM NeXtScale
ttec / transtec | IBM NeXtScale ttec / transtec | IBM NeXtScale
ttec / transtec | IBM NeXtScale
 
ttec | Microsoft Windows Server 2012
ttec | Microsoft Windows Server 2012ttec | Microsoft Windows Server 2012
ttec | Microsoft Windows Server 2012
 
8-way-server
8-way-server8-way-server
8-way-server
 
ttec vSphere 5
ttec vSphere 5ttec vSphere 5
ttec vSphere 5
 
ttec - ParStream
ttec - ParStreamttec - ParStream
ttec - ParStream
 
Backup for dummies
Backup for dummiesBackup for dummies
Backup for dummies
 

HPC Compass 2016_17

  • 2. N E NE NWMore Than 35 Years of Experience in Scientific Computing
  • 3. 1980markedthebeginningofadecadewherenumerousstartups were created, some of which later transformed into big players in the IT market. Technical innovations brought dramatic changes to the nascent computer market. In Tübingen, close to one of Germany’s prime and oldest universities, transtec was founded. In the early days, transtec focused on reselling DEC computers and peripherals, delivering high-performance workstations to university institutes and research facilities. In 1987, SUN/Sparc and storage solutions broadened the portfolio, enhanced by IBM/ RS6000 products in 1991. These were the typical workstations and server systems for high performance computing then, used by the majority of researchers worldwide. Meanwhile, transtec is the biggest European-wide and Germany-based provider of comprehensive HPC solutions. Many of our HPC clusters entered the TOP 500 list of the world’s fastest computing systems. Thus, given this background and history, it is fair to say that tran- stec looks back upon a more than 35 years’ experience in scientif- ic computing; our track record shows more than 1,000 HPC instal- lations. With this experience, we know exactly what customers’ demands are and how to meet them. High performance and ease of management – this is what customers require today. HPC sys- tems are for sure required to peak-perform, as their name indi- cates, but that is not enough: they must also be easy to handle. Unwieldy design and operational complexity must be avoided or at least hidden from administrators and particularly users of HPC computer systems. High Performance Computing has changed in the course of the last 10 years. Whereas traditional topics like simulation, large- SMP systems or MPI parallel computing still are important cor- nerstones of HPC, new areas of applicability like Business Intelli- gence/Analytics, new technologies like in-memory or distributed databases or new algorithms like MapReduce have entered the arena which is now commonly called HPC & Big Data. Knowledge about Hadoop or R nowadays belongs to the basic knowledge of any serious HPC player. Dynamical deployment scenarios are becoming more and more important for realizing private (or even public) HPC cloud sce- narios. The OpenStack ecosystem, object storage concepts like Ceph, application containers with Docker – all these have been bleeding-edge technologies once, but have become state-of-the- art meanwhile. Therefore, this brochure is now called “Technology Compass HPC & Big Data”. It covers all of the above-mentioned currently discussed core technologies. Besides those, more familiar tradi- tional updates like the Intel Omni-Path architecture are covered as well, of course. No matter which ingredient an HPC solution requires – transtec masterly combines excellent and well-cho- sen components that are already there to a fine-tuned, custom- er-specific, and thoroughly designed HPC solution. Your decision for a transtec HPC solution means you opt for most intensive customer care and best service in HPC. Our experts will be glad to bring in their expertise and support to assist you at any stage, from HPC design to daily cluster operations, to HPC Cloud Services. Last but not least, transtec HPC Cloud Services provide custom- ers with the possibility to have their jobs run on dynamically pro- vided nodes in a dedicated datacenter, professionally managed and individually customizable. Numerous standard applications like ANSYS, LS-Dyna, OpenFOAM, as well as lots of codes like Gromacs, NAMD, VMD, and others are pre-installed, integrated into an enterprise-ready cloud management environment, and ready to run. Have fun reading the transtec HPC Compass 2016/17!
  • 4. High Performance Computing for Everyone Interview with Dr. Oliver Tennert Director Technology Management and HPC Solutions
  • 5. Dr. Tennert, what’s the news in HPC? Besides becoming more and more commodity, the HPC market has seen two major changes within the last couple of years: on the one hand side HPC has reached over to adjacent areas that would not have been recognized as part of HPC 5 years ago. Meanwhile however every serious HPC company tries to find their positioning within the area of “big data”, often vaguely understood, but sometimes put more precisely as “big data analytics”. Superficially, it’s all about analyzing large amounts of data as has been part of HPC like ever before. But the areas of appli- cation are becoming more: “big data analytics” is a vital ap- plication of business intelligence/business analytics. Along with this new methods, new concepts, and new technologies emerge. The other change has to do with the way HPC environ- ments are going to be deployed and managed. Dynamic provi- sioning of resources for private-cloud scenarios are becoming more and more important, to which OpenStack has contrib- uted a lot. In the near future, dynamic deployment will be as commodity as efficient, but static deployment today. What’s the deal with “dynamic deployment” anyway? Dynamic deployment is the core provisioning technology in cloud environments, public or private ones. Virtual machines with different OSses, web servers, compute nodes, or any kind of resources are only deployed when they are needed and requested. Dynamic deployment makes sure that the available hardware capacity will be prepared and deployed for exactly the pur- pose for which it is needed. The advantage is: less idle capacity and better utilization than could happen with static environ- ments where e.g. inactive Windows servers block hardware capacity whereas Linux-based compute nodes would be needed at a certain point in time. Everyone is discussing OpenStack at the moment. What do you think of the hype and the development of this platform? And what is its importance for transtec? OpenStack has indeed been discussed at a time when it still had little business relevance for an HPC solutions provider. But this is the case with all innovations, be they hardware- or software-related. But the fact that OpenStack is a joint development effort of several big IT players, had added to the demand for private clouds in the market on the one hand side, and to its relevance as the software solution for that on the other hand side. We at transtec have several customers who either ask us to deploy a completely new OpenStack environment, or would like us to migrate their existing one to OpenStack. OpenStack has the reputation of being a very complex beast, given the many components. Implementing it can take a while. Is there any way to simplify this? Indeed life can be rough if you do it wrong. But there is no need to: no one goes ahead and installs Linux from scratch nowa- days, apart from playing around and learning: instead every- one uses professional distributions with commercial support. There are special OpenStack distributions too, with commer- cial support: especially within HPC, Bright Cluster Manager is often the distribution of choice, but also long-term players like Red Hat offer simple deployment frameworks. Our customers can rest assured that they get an individually optimized OpenStack environment, which is not only de- signed by us, but also installed and maintained throughout. It has always been our job to hide any complexity underneath a layer of “ease of use, ease of management”.
  • 6. 6 Who do you want to address with this topic? Everyone who is interested in a highly flexible, but also effec- tive utilization of their existing hardware capacity. We address customers from academia, as well as SMB customers who have to turn towards new concepts and new technologies, like big enterprises. As said before: in our opinion, this way of resource provision- ing will become mainstream soon. Let’s turn to the other big topic: What is the connection between HPC and Big Data Analytics, and what does the market have to say about this? As already mentioned, the HPC market is reaching out towards big data analytics. “HPC & Big Data” is named in one breath at various key events like ISC or Supercomputing. Thisisnotwithoutreasonofcourse:asdifferentasthetechnol- ogies in use or the goals to reach may be, there is one common denominator: and this is a high demand for computational capacity. An HPC solution provider like transtec, who is ca- pable of designing and deploying a one-petaflop HPC cluster, naturally also manages the provisioning of say a ten-petabyte Hadoop cluster for big data analytics to a customer. But the target group may be a different one: classical HPC mostly takes place within the value chain of the customer. The users are developers. Important areas for big data analytics, however, can be found in business controlling, i.e. BI/BA, but also in companies’ operations departments, when operation- al decisions have to be made depending on certain vital state data, in real time. This is what “internet of things” is all about. High Performance Computing for Everyone Interview with Dr. Oliver Tennert Director Technology Management and HPC Solutions
  • 7. 7 Where are the technical differences between HPC and Big Data Analytics, and what do both have in common? Both areas share the “scale-out” approach. It is not the single server that gets more and more powerful, but rather accelera- tion is reached via parallelization and massive load balancing on several servers. In HPC, parallel computational jobs that run using the MPI programming interface are state of the art. Simulations run on many compute nodes at once, thus cutting down wall-clock time. Within the context of big data analytics, distributed and above all in-memory databases are becoming more and more important. Database requests can thus be handled by several servers, and with a low latency. How do companies handle the backup and the archiving of these huge amounts of data? Given this enormous amount of data, standard backup strat- egies or archiving concepts do indeed not apply anymore. In- stead, for big data analytics, data are stored in a redundant way in principle, to become independent from hardware fail- ures. Storage concepts like Hadoop or object storage like Ceph offer redundancy in data storage as well. Moreover, when it comes to big data analytics, the basic prem- ise is that it really does not matter much when a tiny fraction out of these huge amounts of data go missing for some rea- son. The goal of big data analytics is not to get a precise ana- lytical result out of these often unstructured data, but rather the quickest possible extraction of relevant information and a sensible interpretation, whereby statistical methods are of great importance. For which companies is it important to do this very quick extrac- tion of information? Holiday booking portals that use booking systems have to get an immediate overview of the hotel rooms that are available in some random town, sorted by price, in order to provide cus- tomers with this information. Energy or finance companies who do exchange tradings of their assets, have to know their current portfolio’s value in the fraction of a second to do the right pricing. Search engine providers – there are more than just Google – investigative authorities, credit card clearance houses who have to react quickly to attempted frauds: the list of potential companies who could make use of big data analytics grows alongside with the technical progress.
  • 8. 8 High Performance Computing.............................................. 10 Performance Turns Into Productivity..........................................................12 Flexible Deployment With xCAT......................................................................14 Service and Customer Care From A to Z ....................................................16 OpenStack for HPC Cloud Environments....................... 20 What is OpenStack?................................................................................................22 Object Storage with Ceph...................................................................................24 Application Containers with Docker............................................................30 Advanced Cluster Management Made Easy................. 38 Easy-to-use, Complete and Scalable.............................................................40 Cloud Bursting With Bright................................................................................48 Intelligent HPC Workload Management......................... 50 Moab HPC Suite.........................................................................................................52 IBM Platform LSF......................................................................................................56 Slurm................................................................................................................................58 IBM Platform LSF 8...................................................................................................61 Intel Cluster Ready....................................................................... 64 A Quality Standard for HPC Clusters............................................................66 Intel Cluster Ready Builds HPC Momentum............................................70 The transtec Benchmarking Center..............................................................74 Remote Visualization and Workflow Optimization............................................................. 76 NICE EnginFrame: A Technical Computing Portal................................78 Remote Visualization.............................................................................................82 Desktop Cloud Visualization.............................................................................84 Cloud Computing With transtec and NICE...............................................86 NVIDIA Grid Boards..................................................................................................88 Virtual GPU Technology........................................................................................90 Technology Compass Table of Contents and Introduction
  • 9. 9 Big Data Analytics with Hadoop.......................................... 92 Apache Hadoop.........................................................................................................94 HDFS Architecture...................................................................................................98 NVIDIA GPU Computing – The CUDA Architecture............................................................108 What is GPU Computing?................................................................................. 110 Kepler GK110/210 GPU Computing Architecture............................... 112 A Quick Refresher on CUDA............................................................................. 122 Intel Xeon Phi Coprocessor...................................................124 The Architecture.................................................................................................... 126 An Outlook on Knights Landing and Omni Path................................ 128 Vector Supercomputing..........................................................138 The NEC SX Architecture................................................................................... 140 History of the SX Series..................................................................................... 146 Parallel File Systems – The New Standard for HPC Storage.................................154 Parallel NFS............................................................................................................... 156 What’s New in NFS 4.1?...................................................................................... 158 Panasas HPC Storage.......................................................................................... 160 BeeGFS......................................................................................................................... 172 General Parallel File System (GPFS)............................................................ 176 What’s new in GPFS Version 3.5.................................................................... 190 Lustre............................................................................................................................ 192 Intel Omni-Path Interconnect..............................................196 The Architecture.................................................................................................... 198 Host Fabric Interface (HFI)............................................................................... 200 Architecture Switch Fabric Optimizations............................................ 202 Fabric Software Components........................................................................ 204 Numascale.......................................................................................206 NumaConnect Background............................................................................. 208 Technology................................................................................................................ 210 Redefining Scalable OpenMP and MPI..................................................... 216 Big Data....................................................................................................................... 220 Cache Coherence................................................................................................... 222 NUMA and ccNUMA.............................................................................................. 224 Allinea Forge for parallel Software Development...........................................................226 Application Performance and Development Tools.......................... 228 What’s in Allinea Forge?.................................................................................... 230 Glossary.............................................................................................242
  • 10.
  • 11. 11 High Performance Computing – Performance Turns Into Productivity High Performance Computing (HPC) has been with us from the very beginning of the computer era. High-performance computers were built to solve numerous problems which the “human computers” could not handle. The term HPC just hadn’t been coined yet. More important, some of the early principles have changed fundamentally. Engineering | Life Sciences | Automotive | Price Modelling | Aerospace | CAE | Data Analytics
  • 12. 12 Performance Turns Into Productivity HPC systems in the early days were much different from those we see today. First, we saw enormous mainframes from large computer manufacturers, including a proprietary operating system and job management system. Second, at universities and research institutes, workstations made inroads and scien- tists carried out calculations on their dedicated Unix or VMS workstations. In either case, if you needed more computing power, you scaled up, i.e. you bought a bigger machine. Today the term High-Performance Computing has gained a fundamen- tally new meaning. HPC is now perceived as a way to tackle complex mathematical, scientific or engineering problems. The integration of industry standard, “off-the-shelf” server hardware into HPC clusters facilitates the construction of computer net- works of such power that one single system could never achieve. The new paradigm for parallelization is scaling out. Computer-supported simulations of realistic processes (so-called Computer Aided Engineering – CAE) has established itself as a thirdkeypillarinthefieldofscienceandresearchalongsidetheo- ry and experimentation. It is nowadays inconceivable that an air- craft manufacturer or a Formula One racing team would operate without using simulation software. And scientific calculations, such as in the fields of astrophysics, medicine, pharmaceuticals and bio-informatics, will to a large extent be dependent on super- computers in the future. Software manufacturers long ago rec- ognized the benefit of high-performance computers based on powerful standard servers and ported their programs to them accordingly. The main advantages of scale-out supercomputers is just that: they are infinitely scalable, at least in principle. Since they are based on standard hardware components, such a supercomput- er can be charged with more power whenever the computational capacity of the system is not sufficient any more, simply by add- ing additional nodes of the same kind. A cumbersome switch to a different technology can be avoided in most cases. “transtec HPC solutions are meant to provide customers with unparalleled ease-of-management and ease-of-use. Apart from that, deciding for a transtec HPC solution means deciding for the most intensive customer care and the best service imaginable.” Dr. Oliver Tennert | Director Technology Management & HPC Solutions High Performance Computing Performance Turns Into Productivity
  • 13. S EW SE SW tech BOX 13 Variations on the theme: MPP and SMP Parallel computations exist in two major variants today. Ap- plications running in parallel on multiple compute nodes are frequently so-called Massively Parallel Processing (MPP) appli- cations. MPP indicates that the individual processes can each utilize exclusive memory areas. This means that such jobs are predestined to be computed in parallel, distributed across the nodes in a cluster. The individual processes can thus utilize the separate units of the respective node – especially the RAM, the CPU power and the disk I/O. Communication between the individual processes is implement- ed in a standardized way through the MPI software interface (Message Passing Interface), which abstracts the underlying network connections between the nodes from the processes. However, the MPI standard (current version 2.0) merely requires source code compatibility, not binary compatibility, so an off-the- shelf application usually needs specific versions of MPI libraries in order to run. Examples of MPI implementations are OpenMPI, MPICH2, MVAPICH2, Intel MPI or – for Windows clusters – MS-MPI. If the individual processes engage in a large amount of commu- nication, the response time of the network (latency) becomes im- portant. Latency in a Gigabit Ethernet or a 10GE network is typi- cally around 10 µs. High-speed interconnects such as InfiniBand, reduce latency by a factor of 10 down to as low as 1 µs. Therefore, high-speed interconnects can greatly speed up total processing. The other frequently used variant is called SMP applications. SMP, in this HPC context, stands for Shared Memory Processing. It involves the use of shared memory areas, the specific imple- mentation of which is dependent on the choice of the underlying operating system. Consequently, SMP jobs generally only run on a single node, where they can in turn be multi-threaded and thus be parallelized across the number of CPUs per node. For many HPC applications, both the MPP and SMP variant can be chosen. Many applications are not inherently suitable for parallel execu- tion. In such a case, there is no communication between the in- dividual compute nodes, and therefore no need for a high-speed network between them; nevertheless, multiple computing jobs can be run simultaneously and sequentially on each individual node, depending on the number of CPUs. In order to ensure optimum computing performance for these applications, it must be examined how many CPUs and cores de- liver the optimum performance. We find applications of this sequential type of work typically in the fields of data analysis or Monte-Carlo simulations.
  • 14. BOX 14 xCAT as a Powerful and Flexible Deployment Tool xCAT (Extreme Cluster Administration Tool) is an open source toolkit for the deployment and low-level administration of HPC cluster environments, small as well as large ones. xCAT provides simple commands for hardware control, node discovery, the collection of MAC addresses, and the node deploy- ment with (diskful) or without local (diskless) installation. The cluster configuration is stored in a relational database. Node groups for different operating system images can be defined. Also, user-specific scripts can be executed automatically at in- stallation time. xCAT Provides the Following Low-Level Administrative Features Remote console support Parallel remote shell and remote copy commands Plugins for various monitoring tools like Ganglia or Nagios Hardware control commands for node discovery, collecting MAC addresses, remote power switching and resetting of nodes Automatic configuration of syslog, remote shell, DNS, DHCP, and ntp within the cluster Extensive documentation and man pages For cluster monitoring, we install and configure the open source tool Ganglia or the even more powerful open source solution Na- gios, according to the customer’s preferences and requirements. High Performance Computing Flexible Deployment With xCAT
  • 15. 15 Local Installation or Diskless Installation We offer a diskful or a diskless installation of the cluster nodes. A diskless installation means the operating system is hosted par- tially within the main memory, larger parts may or may not be included via NFS or other means. This approach allows for de- ploying large amounts of nodes very efficiently, and the cluster is up and running within a very small timescale. Also, updating the cluster can be done in a very efficient way. For this, only the boot image has to be updated, and the nodes have to be reboot- ed. After this, the nodes run either a new kernel or even a new operating system. Moreover, with this approach, partitioning the cluster can also be very efficiently done, either for testing pur- poses, or for allocating different cluster partitions for different users or applications. Development Tools, Middleware, and Applications According to the application, optimization strategy, or underly- ing architecture, different compilers lead to code results of very different performance. Moreover, different, mainly commercial, applications, require different MPI implementations. And even when the code is self-developed, developers often prefer one MPI implementation over another. According to the customer’s wishes, we install various compilers, MPI middleware, as well as job management systems like Para- station, Grid Engine, Torque/Maui, or the very powerful Moab HPC Suite for the high-level cluster management.
  • 16. BOX 16 transtec HPC as a Service You will get a range of applications like LS-Dyna, ANSYS, Gromacs, NAMD etc. from all kinds of areas pre-installed, integrated into an enterprise-ready cloud and workload management system, and ready-to run. Do you miss your application? Ask us: HPC@transtec.de transtec Platform as a Service You will be provided with dynamically provided compute nodes for running your individual code. The operating system will be pre-installed according to your requirements. Common Linux dis- tributions like RedHat, CentOS, or SLES are the standard. Do you need another distribution? Ask us: HPC@transtec.de transtec Hosting as a Service You will be provided with hosting space inside a professionally managed and secured datacenter where you can have your ma- chines hosted, managed, maintained, according to your require- ments. Thus, you can build up your own private cloud. What range of hosting and maintenance services do you need? Tell us: HPC@transtec.de HPC @ transtec: Services and Customer Care from A to Z transtec AG has over 30 years of experience in scientific computing andisoneoftheearliestmanufacturersofHPCclusters.Fornearlya decade,transtechasdeliveredhighlycustomizedHighPerformance clusters based on standard components to academic and industry customersacrossEuropewithallthehighqualitystandardsandthe customer-centric approach that transtec is well known for. Every transtec HPC solution is more than just a rack full of hardware – it is a comprehensive solution with everything the HPC user, owner, and operator need. In the early stages of any customer’s HPC project, transtec experts provide extensive and detailed consulting to the customer – they benefit from expertise and experience. Consulting is followed by benchmarking of different systems with either spe- cificallycraftedcustomercodeorgenerallyacceptedbenchmarking routines; this aids customers in sizing and devising the optimal and detailed HPC configuration. Each and every piece of HPC hardware thatleavesourfactoryundergoesaburn-inprocedureof24hoursor High Performance Computing customer training individual Presales consulting application-, customer-, site-specific sizing of HPC solution burn-in tests of systems benchmarking of different systems continual improvement software OS installation application installation onsite hardware assembly integration into customer’s environment maintenance, support managed services Services and Customer Care from A to Z Services and Customer Care from A to Z
  • 17. 17 more if necessary. We make sure that any hardware shipped meets ourandourcustomers’qualityrequirements.transtecHPCsolutions are turnkey solutions. By default, a transtec HPC cluster has every- thing installed and configured – from hardware and operating sys- temtoimportantmiddlewarecomponentslikeclustermanagement or developer tools and the customer’s production applications. Onsite delivery means onsite integration into the customer’s pro- ductionenvironment,beitestablishingnetworkconnectivitytothe corporate network, or setting up software and configuration parts. transtec HPC clusters are ready-to-run systems – we deliver, you turn the key, the system delivers high performance. Every HPC proj- ect entails transfer to production: IT operation processes and poli- ciesapplytothenewHPCsystem.Effectively,ITpersonnelistrained hands-on, introduced to hardware components and software, with all operational aspects of configuration management. transtec services do not stop when the implementation projects ends. Beyond transfer to production, transtec takes care. transtec offers a variety of support and service options, tailored to the cus- tomer’s needs. When you are in need of a new installation, a major reconfiguration or an update of your solution – transtec is able to support your staff and, if you lack the resources for maintaining the cluster yourself, maintain the HPC solution for you. From Professional Services to Managed Services for daily opera- tionsandrequiredservicelevels,transtecwillbeyourcompleteHPC service and solution provider. transtec’s high standards of perfor- mance, reliability and dependability assure your productivity and completesatisfaction.transtec’sofferingsofHPCManagedServices offercustomersthepossibilityofhavingthecompletemanagement and administration of the HPC cluster managed by transtec service specialists,inanITILcompliantway.Moreover,transtec’sHPConDe- mand services help provide access to HPC resources whenever they need them, for example, because they do not have the possibility of owning and running an HPC cluster themselves, due to lacking infrastructure, know-how, or admin staff. transtec HPC Cloud Services Last but not least transtec’s services portfolio evolves as custom- ers‘ demands change. Starting this year, transtec is able to provide HPCCloudServices.transtecusesadedicateddatacentertoprovide computing power to customers who are in need of more capacity than they own, which is why this workflow model is sometimes called computing-on-demand. With these dynamically provided resources, customers with the possibility to have their jobs run on HPC nodes in a dedicated datacenter, professionally managed and secured, and individually customizable. Numerous standard ap- plications like ANSYS, LS-Dyna, OpenFOAM, as well as lots of codes like Gromacs, NAMD, VMD, and others are pre-installed, integrated into an enterprise-ready cloud and workload management environ- ment, and ready to run. Alternatively, whenever customers are in need of space for host- ing their own HPC equipment because they do not have the space capacity or cooling and power infrastructure themselves, transtec is also able to provide Hosting Services to those customers who’d liketohavetheirequipmentprofessionallyhosted,maintained,and managed. Customers can thus build up their own private cloud! Are you interested in any of transtec’s broad range of HPC related services? Write us an email to HPC@transtec.de. We’ll be happy to hear from you!
  • 18. 18 High Performance Computing Services and Customer Care from A to Z Are you an end user? Then doing your work as productive as possible is of utmost im- portance to you! You are interested in an IT environment running smoothly, with a fully functional infrastructure, in competent and professional support by internal administrators and external ser- vice providers, and a service desk with guaranteed availability at working times. HPC cloud services ensure you are able to run your simulation jobs even if computational requirements exceed the lo- cal capacities. support services professional services managed services cloud services Are you an administrator? Then you are responsible for providing a well-performing IT infra- structure, for supporting the end users when problems occur. In situations where problem management fully exploits your capacity, or in cases where you just don’t have time or maybe lack experience in performing certain tasks, you might be glad to be assisted by spe- cialized IT experts. support services professional services Are you an R D manager? Then your job is to ensure that developers, researchers, and other endusersareabletodotheirjob.Tohaveapartnerathandthatpro- vides you with a full range of operations support, either in a com- pletely flexible, on-demand base, or in a fully-managed way based on ITIL best practices, is heartily welcome for you. professional services managed services
  • 19. 19 Services Continuum Support Services Paid to Support Installation Integration Migration Upgrades Configuration Changes Professional Services Paid to Do Service Desk Event Monitoring and Notification Release Management Capacity Management Managed Services Paid to Operate Hosting Infrastructure (IaaS) HPC (HPCaaS) Platform (PaaS) Cloud Services Paid to Use Performance Analysis Technology Assessment Infrastructure Design PLAN BUILD RUN Consulting Services Paid to Advise Are you a CIO? Then your main concern is a smoothly-running IT operations envi- ronment with a maximum of efficiency and effectivity. Managed Services to ensure business continuity are a main part of your con- siderations regarding business objectives. HPC cloud services are a vital part of your capacity plannings to ensure the availability of computational resources to your internal clients, the R D depart- ments. managed services cloud services Are you a CEO? Profitability,time-to-result,businesscontinuity,andensuringfuture viability are topics that you continuously have to think about. You are happy to have a partner at hand, who gives you insight into fu- turedevelopments,providesyouwithtechnologyconsulting,andis able to help you ensure business continuity and fall-back scenarios, as well as scalability readiness, by means of IT capacity outsourcing solutioons. cloud services consulting services
  • 20.
  • 21. Life Sciences | CAE | High Performance Computing | Big Data Analytics | Simulation | CAD OpenStack for HPC Cloud Environments OpenStack is a set of software tools for building and managing cloud computing platforms for public and private clouds. Backed by some of the biggest companies in software development and hosting, as well as thousands of individual community members, many think that OpenStack is the future of cloud computing. OpenStack is managed by the OpenStack Foundation, a non-profit organization which oversees both development and community-building around the project.
  • 22. 22 What is OpenStack? Introduction to OpenStack OpenStackletsusersdeployvirtualmachinesandotherinstances which handle different tasks for managing a cloud environment on the fly. It makes horizontal scaling easy, which means that tasks which benefit from running concurrently can easily serve more or less users on the fly by just spinning up more instances. For example, a mobile application which needs to communicate with a remote server might be able to divide the work of commu- nicating with each user across many different instances, all communicating with one another but scaling quickly and easily as the application gains more users. And most importantly, OpenStack is open source software, which means that anyone who chooses to can access the source code, make any changes or modifications they need, and freely share these changes back out to the community at large. It also means that OpenStack has the benefit of thousands of developers all over the world working in tandem to develop the strongest, most robust, and most secure product that they can. How is OpenStack used in a cloud environment? The cloud is all about providing computing for end users in a remote environment, where the actual software runs as a service on reliable and scalable servers rather than on each end users computer. Cloud computing can refer to a lot of different things, but typically the industry talks about running different items “as a service” – software, platforms, and infrastructure. OpenStack falls into the latter category and is considered Infrastructure as a Service (IaaS). Providing infrastructure means that OpenStack makes it easy for users to quickly add new instance, upon which other cloud components can run. Typically, the infrastructure then runs a “platform” upon which a developer can create soft- ware applications which are delivered to the end users. OpenStack for HPC Cloud Environments What is OpenStack?
  • 23. 23 What are the components of OpenStack? OpenStack is made up of many different moving parts. Because of its open nature, anyone can add additional components to OpenStack to help it to meet their needs. But the OpenStack community has collaboratively identified nine key components that are a part of the “core” of OpenStack, which are distributed as a part of any OpenStack system and officially maintained by the OpenStack community. Nova is the primary computing engine behind OpenStack. It is used for deploying and managing large numbers of virtual machines and other instances to handle computing tasks. Swift is a storage system for objects and files. Rather than the traditional idea of a referring to files by their location on a disk drive, developers can instead refer to a unique identi- fier referring to the file or piece of information and let Open- Stack decide where to store this information. This makes scaling easy, as developers don’t have the worry about the capacity on a single system behind the software. It also allows the system, rather than the developer, to worry about how best to make sure that data is backed up in case of the failure of a machine or network connection. Cinder is a block storage component, which is more analo- gous to the traditional notion of a computer being able to access specific locations on a disk drive. This more tradi- tional way of accessing files might be important in scenarios in which data access speed is the most important consider- ation. Neutron provides the networking capability for OpenStack. It helps to ensure that each of the components of an Open- Stack deployment can communicate with one another quickly and efficiently. Horizon is the dashboard behind OpenStack. It is the only graphical interface to OpenStack, so for users wanting to give OpenStack a try, this may be the first component they actually “see.” Developers can access all of the components of OpenStack individually through an application program- ming interface (API), but the dashboard provides system administrators a look at what is going on in the cloud, and to manage it as needed. Keystone provides identity services for OpenStack. It is essentially a central list of all of the users of the OpenStack cloud, mapped against all of the services provided by the cloudwhichtheyhavepermissiontouse.Itprovidesmultiple means of access, meaning developers can easily map their existing user access methods against Keystone. Glance provides image services to OpenStack. In this case, “images” refers to images (or virtual copies) of hard disks. Glance allows these images to be used as templates when deploying new virtual machine instances. Ceilometer provides telemetry services, which allow the cloud to provide billing services to individual users of the cloud. It also keeps a verifiable count of each user’s system usage of each of the various components of an OpenStack cloud. Think metering and usage reporting. Heat is the orchestration component of OpenStack, which allows developers to store the requirements of a cloud appli- cation in a file that defines what resources are necessary for that application. In this way, it helps to manage the infra- structure needed for a cloud service to run.
  • 24. 24 OpenStack for HPC Cloud Environments Object Storage with Ceph Object Storage with Ceph Why “Ceph”? “Ceph” is an odd name for a file system and breaks the typical acronym trend that most follow. The name is a reference to the mascot at UCSC (Ceph’s origin), which happens to be “Sammy,” the banana slug, a shell-less mollusk in the cephalopods class. Cephalopods, with their multiple tentacles, provide a great meta- phor for a distributed file system. Ceph goals Developing a distributed file system is a complex endeavor, but it’s immensely valuable if the right problems are solved. Ceph’s goals can be simply defined as: Easy scalability to multi-petabyte capacity High performance over varying workloads (input/output operations per second [IOPS] and bandwidth) Strong reliability Unfortunately, these goals can compete with one another (for example, scalability can reduce or inhibit performance or impact reliability). Ceph has developed some very interesting concepts (such as dynamic metadata partitioning and data distribution and replication), which this article explores shortly. Ceph’s design also incorporates fault-tolerance features to protect against single points of failure, with the assumption that storage failures on a large scale (petabytes of storage) will be the norm rather than the exception. Finally, its design does not assume particular workloads but includes the ability to adapt to changing distrib- uted workloads to provide the best performance. It does all of this with the goal of POSIX compatibility, allowing it to be trans- parently deployed for existing applications that rely on POSIX semantics (through Ceph-proposed enhancements). Finally, Ceph is open source distributed storage and part of the mainline Linux kernel (2.6.34).
  • 25. 25 User/Admin Advanced Services (Consume Iaas) Cloud Management Infrastucture as a Service API (REST) CLI (python - client) UI (Horizon) Data Processing (Sahara) Key Management (Barbican) Compute (Nova) Compute (Ironic) Network (Neutron) Image Management (Glance) Storage (Swift) (Cinder) (Manila) Database Management (Trove) Message Queue (Zaqar) DNS Management (Designate) Deployment (Triple O) Telemetry (Ceilometer) Orchestration (HEAT) Common Library (Oslo) Identity (Keystone) Common/Shared Message Queue/Database(s) OpenStack Overview
  • 26. 26 OpenStack for HPC Cloud Environments Object Storage with Ceph Ceph architecture Now, let’s explore the Ceph architecture and its core elements at a high level. I then dig down another level to identify some of the key aspects of Ceph to provide a more detailed exploration. The Ceph ecosystem can be broadly divided into four segments (see Figure 1): clients (users of the data), metadata servers (which cache and synchronize the distributed metadata), an object storage cluster (which stores both data and metadata as objects andimplementsotherkeyresponsibilities),andfinallythecluster monitors (which implement the monitoring functions). As Figure 1 shows, clients perform metadata operations (to iden- tify the location of data) using the metadata servers. The meta- data servers manage the location of data and also where to store new data. Note that metadata is stored in the storage cluster (as indicated by “Metadata I/O”). Object Storage Cluster Metadata Server Cluster Cluster Monitors Metadata ops File (object) I/O Metadata I/O Clients Figure 1: Conceptual architecture of the Ceph ecosystem
  • 27. 27 Actual file I/O occurs between the client and object storage cluster. In this way, higher-level POSIX functions (such as open, close, and rename) are managed through the metadata servers, whereas POSIX functions (such as read and write) are managed directly through the object storage cluster. Another perspective of the architecture is provided in Figure 2. A set of servers access the Ceph ecosystem through a client inter- face, which understands the relationship between metadata servers and object-level storage. The distributed storage system can be viewed in a few layers, including a format for the storage devices (the Extent and B-tree-based Object File System [EBOFS] or an alternative) and an overriding management layer designed to manage data replication, failure detection, and recovery and subsequent data migration called Reliable Autonomic Distrib- uted Object Storage (RADOS). Finally, monitors are used to iden- tify component failures, including subsequent notification. Server Object storage daemon BTRFS/EBOFS/... Storage Metadata servers Ceph client interface Figure 2: Simplified layered view of the Ceph ecosystem Ceph components With the conceptual architecture of Ceph under your belts, you can dig down another level to see the major components imple- mented within the Ceph ecosystem. One of the key differences between Ceph and traditional file systems is that rather than focusing the intelligence in the file system itself, the intelligence is distributed around the ecosystem. Figure 3 shows a simple Ceph ecosystem. The Ceph Client is the useroftheCephfilesystem.TheCephMetadataDaemonprovides the metadata services, while the Ceph Object Storage Daemon provides the actual storage (for both data and metadata). Finally, the Ceph Monitor provides cluster management. Note that there can be many Ceph clients, many object storage endpoints, numerous metadata servers (depending on the capacity of the file system), and at least a redundant pair of monitors. So, how is this file system distributed? Applications/Users cmon (Ceph monitor) cosd (Ceph object storage daemon) BTRFS/EBOFS/... cmds (Ceph metadata daemon) DRAM cache (Enhanced) POSIX VFS User-space Linux kernel Ceph Disk DiskDiskDisk Ceph client Figure 3: Simple Ceph ecosystem
  • 28. 28 Kernel or user space Early versions of Ceph utilized Filesystems in User SpacE (FUSE), which pushes the file system into user space and can greatly simplify its development. But today, Ceph has been integrated into the mainline kernel, making it faster, because user space context switches are no longer necessary for file system I/O. Ceph client As Linux presents a common interface to the file systems (through the virtual file system switch [VFS]), the user’s perspec- tive of Ceph is transparent. The administrator’s perspective will certainly differ, given the potential for many servers encom- passing the storage system (see the Resources section for infor- mation on creating a Ceph cluster). From the users’ point of view, they have access to a large storage system and are not aware of the underlying metadata servers, monitors, and individual object storage devices that aggregate into a massive storage pool. Users simply see a mount point, from which standard file I/O can be performed. The Ceph file system–or at least the client interface–is imple- mented in the Linux kernel. Note that in the vast majority of file systems, all of the control and intelligence is implemented within the kernel’s file system source itself. But with Ceph, the file system’s intelligence is distributed across the nodes, which simplifies the client interface but also provides Ceph with the ability to massively scale (even dynamically). Rather than rely on allocation lists (metadata to map blocks on a disk to a given file), Ceph uses an interesting alternative. A file from the Linux perspective is assigned an inode number (INO) from the metadata server, which is a unique identifier for the file. The file is then carved into some number of objects (based on the size of the file). Using the INO and the object number (ONO), each object is assigned an object ID (OID). Using a simple hash over the OID, each object is assigned to a placement group. OpenStack for HPC Cloud Environments Object Storage with Ceph
  • 29. 29 The placement group (identified as a PGID) is a conceptual container for objects. Finally, the mapping of the placement group to object storage devices is a pseudo-random mapping using an algorithm called Controlled Replication Under Scalable Hashing (CRUSH). In this way, mapping of placement groups (and replicas) to storage devices does not rely on any metadata but instead on a pseudo-random mapping function. This behavior is ideal, because it minimizes the overhead of storage and simpli- fies the distribution and lookup of data. The final component for allocation is the cluster map. The cluster map is an efficient representation of the devices representing the storage cluster. With a PGID and the cluster map, you can locate any object. The Ceph metadata server The job of the metadata server (cmds) is to manage the file system’s namespace. Although both metadata and data are stored in the object storage cluster, they are managed separately to support scalability. In fact, metadata is further split among a cluster of metadata servers that can adaptively replicate and distribute the namespace to avoid hot spots. As shown in Figure 4, the metadata servers manage portions of the namespace and can overlap (for redundancy and also for performance). The mapping of metadata servers to namespace is performed in Ceph using dynamic subtree partitioning, which allows Ceph to adapt to changing workloads (migrating namespaces between metadata servers) while preserving locality for performance. But because each metadata server simply manages the namespace for the population of clients, its primary applica- tion is an intelligent metadata cache (because actual metadata is eventually stored within the object storage cluster). Metadata to write is cached in a short-term journal, which eventually is pushed to physical storage. This behavior allows the meta- data server to serve recent metadata back to clients (which is common in metadata operations). The journal is also useful for failure recovery: if the metadata server fails, its journal can be replayed to ensure that metadata is safely stored on disk. Metadata servers manage the inode space, converting file names to metadata. The metadata server transforms the file name into an inode, file size, and striping data (layout) that the Ceph client uses for file I/O. Ceph monitors Ceph includes monitors that implement management of the cluster map, but some elements of fault management are imple- mented in the object store itself. When object storage devices fail or new devices are added, monitors detect and maintain a valid cluster map. This function is performed in a distributed fashion where map updates are communicated with existing traffic. Ceph uses Paxos, which is a family of algorithms for distributed consensus. Ceph object storage Similar to traditional object storage, Ceph storage nodes include not only storage but also intelligence. Traditional drives are simple targets that only respond to commands from initiators. But object storage devices are intelligent devices that act as both targets and initiators to support communication and collabora- tion with other object storage devices. MDS₀ MDS₁ MDS₂ MDS₃ MDS₄ Figure 4: Partitioning of the Ceph namespace for metadata servers
  • 30. 30 From a storage perspective, Ceph object storage devices perform the mapping of objects to blocks (a task traditionally done at the file system layer in the client). This behavior allows the local entity to best decide how to store an object. Early versions of Ceph implemented a custom low-level file system on the local storage called EBOFS. This system implemented a nonstandard interface to the underlying storage tuned for object seman- tics and other features (such as asynchronous notification of commits to disk). Today, the B-tree file system (BTRFS) can be used at the storage nodes, which already implements some of the necessary features (such as embedded integrity). Because the Ceph clients implement CRUSH and do not have knowledge of the block mapping of files on the disks, the under- lying storage devices can safely manage the mapping of objects to blocks. This allows the storage nodes to replicate data (when a device is found to have failed). Distributing the failure recovery also allows the storage system to scale, because failure detec- tion and recovery are distributed across the ecosystem. Ceph calls this RADOS (see Figure 3). Other features of interest As if the dynamic and adaptive nature of the file system weren’t enough, Ceph also implements some interesting features visible to the user. Users can create snapshots, for example, in Ceph on any subdirectory (including all of the contents). It’s also possible to perform file and capacity accounting at the subdirectory level, which reports the storage size and number of files for a given subdirectory (and all of its nested contents). Ceph status and future Although Ceph is now integrated into the mainline Linux kernel, it’s properly noted there as experimental. File systems in this state are useful to evaluate but are not yet ready for production envi- ronments. But given Ceph’s adoption into the Linux kernel and the motivation by its originators to continue its development, it should be available soon to solve your massive storage needs. OpenStack for HPC Cloud Environments Application Containers with Docker
  • 31. 31 Application Containers with Docker Transforming Business Through Software Gone are the days of private datacenters running off-the-shelf software and giant monolithic code bases that you updated once a year. Everything has changed. Whether it is moving to the cloud, migrating between clouds, modernizing legacy or building new apps and data structure, the desired results are always the same – speed. The faster you can move defines your success as a company. Software is the critical IP that defines your company even if the actual product you are selling may be a t-shirt, a car, or compounding interest. Software is how you engage your customers, reach new users, understand their data, promote your product or service and process their order. To do this well, today’s software is going bespoke. Small pieces of software that are designed for a very specific job are called microservices. The design goal of microservices is to have each service built with all of the necessary components to “run” a specific job with just the right type of underlying infrastructure resources. Then, these services are loosely coupled together so they can be changed at anytime, without having to worry about the service that comes before or after it. This methodology, while great for continuous improvement, poses many challenges in reaching the end state. First it creates a new, ever-expanding matrix of services, dependencies and infrastructure making it difficult to manage. Additionally it does not account for the vast amounts of existing legacy applications in the landscape, the heterogeneity up and down the application stack and the processes required to ensure this works in practice. The Docker Journey and the Power of AND In 2013, Docker entered the landscape with application containers to build, ship and run applications anywhere. Docker was able to take software and its dependencies package them up into a lightweight container. Similar to how shipping containers are today, software containers are simply a standard unit of soft- ware that looks the same on the the outside regardless of what code and dependencies are included on the inside. This enabled developers and sysadmins to transport them across infrastruc- tures and various environments without requiring any modi- fications and regardless of the varying configurations in the different environments. The Docker journey begins here. Agility: The speed and simplicity of Docker was an instant hit with developers and is what led to the meteoric rise in the open source project. Developers are now able to very simply package up any software and its dependencies into a container. Developers are able to use any language, version and tooling because they are all packaged in a container, which “standardizes” all that heterogeneity without sacrifice. Portability: Just by the nature of the Docker tech- nology, these very same developers have realized their application containers are now portable – in ways not previously possible. They can ship their applications from development, to test and production and the code will work as designed every time. Any differences in the environment did not affect what was inside the container. Nor did they need to change their application to work in production. This was also a boon for IT operations teams as they can now move applications across datacenters for clouds to avoid vendor lock in.
  • 32. 32 Control: As these applications move along the life- cycle to production, new questions around security, manageability and scale need to be answered. Docker “standardizes” your environment while maintaining the hetero- geneity your business requires. Docker provides the ability to set the appropriate level of control and flexibility to maintain your service levels, performance and regulatory compliance. IT oper- ations teams can provision, secure, monitor and scale the infra- structure and applications to maintain peak service levels. No two applications or businesses are alike and the Docker allows you to decide how to control your application environment. At the core of the Docker journey is the power of AND. Docker is the only solution to provide agility, portability and control for developers and IT operations team across all stages of the appli- cation lifecycle. From these core tenets, Containers as a Service (CaaS) emerges as the construct by which these new applications are built better and faster. Docker Containers as a Service (Caas) What is Containers as a Service (CaaS)? It is an IT managed and secured application environment of infrastructure and content where developers can in a self service manner, build and deploy applications. BUILD Development Environments Secure Content Collaboration Deploy, Manage, Scale SHIP RUN Developers IT Operations Diagram: CaaS Workflow OpenStack for HPC Cloud Environments Application Containers with Docker
  • 33. 33 In the CaaS diagram above, development and IT operations team collaborate through the registry. This is a service in which a library of secure and signed images can be maintained. From the registry, developers on the left are able to pull and build soft- ware at their pace and then push the content back to the registry once it passes integration testing to save the latest version. Depending on the internal processes, the deployment step is either automated with tools or can be manually deployed. The IT operations team on the right in above diagram manages the different vendor contracts for the production infrastruc- ture such as compute, networking and storage. These teams are responsible to provision the compute resources needed for the application and use the Docker Universal Control Plane to monitor the clusters and applications over time. Then, they can move the apps from one cloud to another or scale up or down a service to maintain peak performance. Key Characteristics and Considerations The Docker CaaS provides a framework for organizations to unify the variety of systems, languages and tools in their environment and apply the level of control, security or freedom required for their business. As a Docker native solution with full support of the Docker API, Docker CaaS can seamlessly take the application from local development to production without changing the code and streamlining the deployment cycle. The following characteristics form the minimum requirements for any organization’s application environment. In this paradigm, development and IT operations teams are empowered to use the best tools for their respective jobs without worrying of breaking systems, each other’s workflows or lock-in. 2 3 The needs of developers and operations. Many tools specifically address the functional needs of only one team; however, CaaS breaks the cycle for continuous improvement. To truly gain acceleration in the development to production timeline, you need to address both users along a continuum. Docker provides unique capabilities for each team as well as a consistent API across the entire platform for a seamless transition from one team to the next. All stages in the application lifecycle. From continuous integration to delivery and devops, these practices are about eliminating the waterfall development methodology and the lagging innovation cycles with it. By providing tools for both the developer and IT operations, Docker is able to seamlessly support an application from build, test, stage to production. Any language. Developer agility means the freedom to build with whatever language, version and tooling required for the features they are building at that time. Also, the ability to run multiple versions of a language at the same time provides a greater level of flexibility. Docker allows your team to focus on building the app instead of thinking of how to build an app that works in Docker. Any operating system. The vast majority of organizations have more than one oper- ating system. Some tools just work better in Linux while others in Windows. Application platforms need to account and support this diversity, otherwise they are solving only part of the problem. Originally created for the Linux commu- nity, Docker and Microsoft are bringing forward Windows Server support to address the millions of enterprise applica- tions in existence today and future applications. 1 4
  • 34. 34 Any infrastructure. When it comes to infrastructure, organizations want choice, backup and leverage. Whether that means you have multiple private data centers, a hybrid cloud or multiple cloud providers, the critical component is the ability to move workloads from one environment to another, without causing application issues. The Docker technology architec- ture abstracts the infrastructure away from the application allowing the application containers to be run anywhere and portable across any other infrastructure. Open APIs, pluggable architecture and ecosystem. A platform isn’t really a platform if it is an island to itself. Implementing new technologies is often not possible if you need to re-tool your existing environment first. A funda- mental guiding principle of Docker is a platform that is open. Being open means APIs and plugins to make it easy for you to leverage your existing investments and to fit Docker into your environment and processes. This openness invites a rich ecosystem to flourish and provide you with more flexibility and choice in adding specialized capabilities to your CaaS. Although many, these characteristics are critical as the new bespoke application paradigms only invite in greater hetero- geneity into your technical architecture. The Docker CaaS plat- form is fundamentally designed to support that diversity, while providing the appropriate controls to manage at any scale. 5 6 OpenStack for HPC Cloud Environments Application Containers with Docker
  • 35. 35 Docker CaaS The Docker CaaS platform is made possible by a suite of inte- grated software solutions with a flexible deployment model to meet the needs of your business. On-Premises/VPC: For organizations who need to keep their IP within their network, Docker Trusted Registry and Docker Universal Control Plane can be deployed on-premises or in a VPC and connected to your existing infrastructure and systems like storage, Active Directory/LDAP, monitoring and logging solutions. Trusted Registry provides the ability to store and manage images on your storage infrastructure while also managing role based access control to the images. Universal Control Plane provides visibility across your Docker environment including Swarm clusters, Trusted Registry repositories, containers and multi container applications. In the Cloud: For organizations who readily use SaaS solu- tions, Docker Hub and Tutum by Docker provide a registry service and control plane that is hosted and managed by Docker. Hub is a cloud registry service to store and manage your images and users permissions. Tutum by Docker provi- sions and manages the deployment clusters as well as moni- tors and manages the deployed applications. Connect to the cloud infrastructure of your choice or bring your own phys- ical node to deploy your application. Your Docker CaaS can be designed to provide a centralized point of control and management or allow for decentralized manage- ment to empower individual application teams. The flexibility allows you to create a model that is right for your business, just like how you choose your infrastructure and implement processes. CaaS is an extension of that to build, ship and run applications. Many IT initiatives are enabled and in fact, accelerated by CaaS due to its unifying nature across the environment. Each organiza- tion has their take on the terms for the initiatives but they range from things like containerization, which may involve the “lift and shift” of existing apps, or the adoption of microservices to continuous integration, delivery and devops and various flavors of the cloud including adoption, migration, hybrid and multiple. In each scenario, Docker CaaS brings the agility, portability and control to enable the adoption of those use cases across the organization. BUILD SHIP RUN Docker Toolbox Docker Hub Docker Trusted Registry Tutm by Docker Docker Universal Control Plane Diagram: CaaS Workflow
  • 36. 36 S EW SE SW tech BOX Architecture of Docker Docker is an open-source project that automates the deploy- ment of applications inside software containers, by providing an additional layer of abstraction and automation of operat- ing-system-level virtualization on Linux. Docker uses the re- source isolation features of the Linux kernel such as cgroups and kernel namespaces, and a union-capable filesystem such as AUFS and others to allow independent “containers” to run with- in a single Linux instance, avoiding the overhead of starting and maintaining virtual machines. The Linux kernel’s support for namespaces mostly isolates an application’s view of the operating environment, including pro- cess trees, network, user IDs and mounted file systems, while the kernel’s cgroups provide resource limiting, including the CPU, memory, block I/O and network. Since version 0.9, Docker includes the libcontainer library as its own way to directly use virtualization facilities provided by the Linux kernel, in addi- tion to using abstracted virtualization interfaces via libvirt, LXC (Linux Containers) and systemd-nspawn. As actions are done to a Docker base image, union filesystem lay- ers are created and documented, such that each layer fully de- scribes how to recreate an action. This strategy enables Docker’s lightweight images, as only layer updates need to be propagated (compared to full VMs, for example). OpenStack for HPC Cloud Environments Application Containers with Docker
  • 37. 37 Overview Docker implements a high-level API to provide lightweight containers that run processes in isolation. Building on top of facilities provided by the Linux kernel (primar- ily cgroups and namespaces), a Docker container, unlike a vir- tual machine, does not require or include a separate operating system. Instead, it relies on the kernel’s functionality and uses resource isolation (CPU, memory, block I/O, network, etc.) and separate namespaces to isolate the application’s view of the operating system. Docker accesses the Linux kernel’s virtualiza- tion features either directly using the libcontainer library, which is available as of Docker 0.9, or indirectly via libvirt, LXC (Linux Containers) or systemd-nspawn. By using containers, resources can be isolated, services restrict- ed, and processes provisioned to have an almost completely private view of the operating system with their own process ID space, file system structure, and network interfaces. Multiple containers share the same kernel, but each container can be con- strained to only use a defined amount of resources such as CPU, memory and I/O. Using Docker to create and manage containers may simplify the creation of highly distributed systems by allowing multiple applications, worker tasks and other processes to run autono- mously on a single physical machine or across multiple virtual machines. This allows the deployment of nodes to be performed as the resources become available or when more nodes are need- ed, allowing a platform as a service (PaaS)-style of deployment and scaling for systems like Apache Cassandra, MongoDB or Riak. Docker also simplifies the creation and operation of task or work- load queues and other distributed systems. Integration Docker can be integrated into various infrastructure tools, in- cluding Amazon Web Services, Ansible, CFEngine, Chef, Google Cloud Platform, IBM Bluemix, Jelastic, Jenkins,Microsoft Azure, OpenStack Nova, OpenSVC, HPE Helion Stackato, Puppet, Salt, Vagrant, and VMware vSphere Integrated Containers. The Cloud Foundry Diego project integrates Docker into the Cloud Foundry PaaS. The GearD project aims to integrate Docker into the Red Hat’s OpenShift Origin PaaS. libcontainer libvirt systemd- nspawnLXC cgroups namespaces SELinux AppArmor Netfilter Netlink capabilities Docker Linux kernel Docker can use different interfaces to access virtualization features of the Linux kernel.
  • 38.
  • 39. 39 Advanced Cluster Management Made Easy High Performance Computing (HPC) clusters serve a crucial role at research-intensive organizations in industries such as aerospace, meteorology, pharmaceuticals, and oil and gas exploration. The thing they have in common is a requirement for large-scale computations. Bright’s solution for HPC puts the power of supercomputing within the reach of just about any organization. Engineering | Life Sciences | Automotive | Price Modelling | Aerospace | CAE | Data Analytics
  • 40. 40 The Bright Advantage Bright Cluster Manager for HPC makes it easy to build and oper- ate HPC clusters using the server and networking equipment of your choice, tying them together into a comprehensive, easy to manage solution. Bright Cluster Manager for HPC lets customers deploy complete clusters over bare metal and manage them effectively. It pro- vides single-pane-of-glass management for the hardware, the operating system, HPC software, and users. With Bright Cluster Manager for HPC, system administrators can get clusters up and running quickly and keep them running reliably throughout their life cycle – all with the ease and elegance of a fully featured, en- terprise-grade cluster manager. Bright Cluster Manager was designed by a group of highly expe- rienced cluster specialists who perceived the need for a funda- mental approach to the technical challenges posed by cluster management. The result is a scalable and flexible architecture, forming the basis for a cluster management solution that is ex- tremely easy to install and use, yet is suitable for the largest and most complex clusters. Scale clusters to thousands of nodes Bright Cluster Manager was designed to scale to thousands of nodes. It is not dependent on third-party (open source) software that was not designed for this level of scalability. Many advanced features for handling scalability and complexity are included: Management Daemon with Low CPU Overhead The cluster management daemon (CMDaemon) runs on every node in the cluster, yet has a very low CPU load – it will not cause any noticeable slow-down of the applications running on your cluster. Because the CMDaemon provides all cluster management func- tionality, the only additional daemon required is the workload manager (or queuing system) daemon. For example, no daemons The cluster installer takes you through the installation process and offers advanced options such as “Express” and “Remote”. Advanced Cluster Management Made Easy Easy-to-use, complete and scalable
  • 41. 41 for monitoring tools such as Ganglia or Nagios are required. This minimizes the overall load of cluster management software on your cluster. Multiple Load-Balancing Provisioning Nodes Node provisioning – the distribution of software from the head node to the regular nodes – can be a significant bottleneck in large clusters. With Bright Cluster Manager, provisioning capabil- ity can be off-loaded to regular nodes, ensuring scalability to a virtually unlimited number of nodes. When a node requests its software image from the head node, the head node checks which of the nodes with provisioning ca- pability has the lowest load and instructs the node to download or update its image from that provisioning node. Synchronized Cluster Management Daemons The CMDaemons are synchronized in time to execute tasks in exact unison. This minimizes the effect of operating system jit- ter (OS jitter) on parallel applications. OS jitter is the “noise” or “jitter” caused by daemon processes and asynchronous events such as interrupts. OS jitter cannot be totally prevented, but for parallel applications it is important that any OS jitter occurs at the same moment in time on every compute node. Built-In Redundancy Bright Cluster Manager supports redundant head nodes and re- dundant provisioning nodes that can take over from each other in case of failure. Both automatic and manual failover configura- tions are supported. Other redundancy features include support for hardware and software RAID on all types of nodes in the cluster. Diskless Nodes Bright Cluster Manager supports diskless nodes, which can be useful to increase overall Mean Time Between Failure (MTBF), particularly on large clusters. Nodes with only InfiniBand and no Ethernet connection are possible too. Bright Cluster Manager 7 for HPC For organizations that need to easily deploy and manage HPC clusters Bright Cluster Manager for HPC lets customers deploy complete clusters over bare metal and manage them effectively. It provides single-pane-of-glass management for the hardware, the operating system, the HPC software, and users. With Bright Cluster Manager for HPC, system administrators can quickly get clusters up and running and keep them running relia- bly throughout their life cycle – all with the ease and elegance of a fully featured, enterprise-grade cluster manager. Features Simple Deployment Process – Just answer a few questions about your cluster and Bright takes care of the rest. It installs all of the software you need and creates the necessary config- uration files for you, so you can relax and get your new cluster up and running right – first time, every time.
  • 42. S EW SE SW tech BOX 42 Head Nodes Regular Nodes A cluster can have different types of nodes, but it will always have at least one head node and one or more “regular” nodes. A regular node is a node which is controlled by the head node and which receives its software image from the head node or from a dedicated provisioning node. Regular nodes come in different flavors. Some examples include: Failover Node – A node that can take over all functionalities of the head node when the head node becomes unavailable. Compute Node – A node which is primarily used for compu- tations. Login Node – A node which provides login access to users of the cluster. It is often also used for compilation and job sub- mission. I/O Node – A node which provides access to disk storage. Provisioning Node – A node from which other regular nodes can download their software image. Workload Management Node – A node that runs the central workload manager, also known as queuing system. Subnet Management Node – A node that runs an InfiniBand subnet manager. More types of nodes can easily be defined and configured with Bright Cluster Manager. In simple clusters there is only one type of regular node – the compute node – and the head node fulfills all cluster management roles that are described in the regular node types mentioned above. Advanced Cluster Management Made Easy Easy-to-use, complete and scalable
  • 43. 43 All types of nodes (head nodes and regular nodes) run the same CMDaemon. However, depending on the role that has been as- signed to the node, the CMDaemon fulfills different tasks. This node has a ‘Login Role’, ‘Storage Role’ and ‘PBS Client Role’. The architecture of Bright Cluster Manager is implemented by the following key elements: The Cluster Management Daemon (CMDaemon) The Cluster Management Shell (CMSH) The Cluster Management GUI (CMGUI) Connection Diagram A Bright cluster is managed by the CMDaemon on the head node, which communicates with the CMDaemons on the other nodes over encrypted connections. The CMDaemon on the head node exposes an API which can also be accessed from outside the cluster. This is used by the cluster management Shell and the cluster management GUI, but it can also be used by other applications if they are programmed to ac- cess the API. The API documentation is available on request. The diagram below shows a cluster with one head node, three regular nodes and the CMDaemon running on each node. Elements of the Architecture
  • 44. 44 Can Install on Bare Metal – With Bright Cluster Manager there is nothing to pre-install. You can start with bare metal serv- ers, and we will install and configure everything you need. Go from pallet to production in less time than ever. Comprehensive Metrics and Monitoring – Bright Cluster Man- ager really shines when it comes to showing you what’s going on in your cluster. It’s beautiful graphical user interface lets you monitor resources and services across the entire cluster in real time. Powerful User Interfaces – With Bright Cluster Manager you don’t just get one great user interface, we give you two. That way, you can choose to provision, monitor, and manage your HPC clusters with a traditional command line interface or our beautiful, intuitive graphical user interface. OptimizetheUseofITResourcesbyHPCApplications–Bright ensures HPC applications get the resources they require according to the policies of your organization. Your cluster will prioritize workloads according to your business priorities. HPCToolsandLibrariesIncluded–EverycopyofBrightCluster Manager comes with a complete set of HPC tools and libraries so you will be ready to develop, debug, and deploy your HPC code right away. “The building blocks for transtec HPC solutions must be chosen according to our goals ease-of-management and ease-of-use. With Bright Cluster Manager, we are happy to have the technology leader at hand, meeting these requirements, and our customers value that.” Jörg Walter | HPC Solution Engineer Monitor your entire cluster at an glance with Bright Cluster Manager Advanced Cluster Management Made Easy Easy-to-use, complete and scalable
  • 45. 45 Quickly build and deploy an HPC cluster When you need to get an HPC cluster up and running, don’t waste time cobbling together a solution from various open source tools. Bright’s solution comes with everything you need to set up a complete cluster from bare metal and manage it with a beauti- ful, powerful user interface. Whether your cluster is on-premise or in the cloud, Bright is a virtual one-stop-shop for your HPC project. Build a cluster in the cloud Bright’s dynamic cloud provisioning capability lets you build an entire cluster in the cloud or expand your physical cluster into the cloud for extra capacity. Bright allocates the compute re- sources in Amazon Web Services automatically, and on demand. Build a hybrid HPC/Hadoop cluster Does your organization need HPC clusters for technical comput- ing and Hadoop clusters for Big Data? The Bright Cluster Man- ager for Apache Hadoop add-on enables you to easily build and manage both types of clusters from a single pane of glass. Your system administrators can monitor all cluster operations, man- age users, and repurpose servers with ease. Driven by research Bright has been building enterprise-grade cluster management software for more than a decade. Our solutions are deployed in thousands of locations around the globe. Why reinvent the wheel when you can rely on our experts to help you implement the ideal cluster management solution for your organization. Clusters in the cloud Bright Cluster Manager can provision and manage clusters that are running on virtual servers inside the AWS cloud as if they were local machines. Use this feature to build an entire cluster in AWS from scratch or extend a physical cluster into the cloud when you need extra capacity. Use native services for storage in AWS – Bright Cluster Manager lets you take advantage of inexpensive, secure, durable, flexible and simple storage services for data use, archiving, and backup in the AWS cloud. Make smart use of AWS for HPC – Bright Cluster Manager can save you money by instantiating compute resources in AWS only when they are needed. It uses built-in intelligence to create in- stances only after the data is ready for processing and the back- log in on-site workloads requires it. You decide which metrics to monitor. Just drag a component into the display area an Bright Cluster Manager instantly creates a graph of the data you need.
  • 46. BOX 46 High performance meets efficiency Initially, massively parallel systems constitute a challenge to both administrators and users. They are complex beasts. Anyone building HPC clusters will need to tame the beast, master the complexity and present users and administrators with an easy- to-use, easy-to-manage system landscape. Leading HPC solution providers such as transtec achieve this goal. They hide the complexity of HPC under the hood and match high performance with efficiency and ease-of-use for both users and administrators. The “P” in “HPC” gains a double meaning: “Performance” plus “Productivity”. Cluster and workload management software like Moab HPC Suite, Bright Cluster Manager or Intel Fabric Suite provide the means to master and hide the inherent complexity of HPC sys- tems. For administrators and users, HPC clusters are presented as single, large machines, with many different tuning parame- ters. The software also provides a unified view of existing clus- ters whenever unified management is added as a requirement by the customer at any point in time after the first installation. Thus, daily routine tasks such as job management, user management, queue partitioning and management, can be performed easily with either graphical or web-based tools, without any advanced scripting skills or technical expertise required from the adminis- trator or user. Advanced Cluster Management Made Easy Easy-to-use, complete and scalable
  • 47. S EW SE SW tech BOX 47 Bright Cluster Manager unleashes and manages the unlimited power of the cloud Create a complete cluster in Amazon EC2, or easily extend your onsite cluster into the cloud. Bright Cluster Manager provides a “single pane of glass” to both your on-premise and cloud resourc- es, enabling you to dynamically add capacity and manage cloud nodes as part of your onsite cluster. Cloud utilization can be achieved in just a few mouse clicks, with- out the need for expert knowledge of Linux or cloud computing. With Bright Cluster Manager, every cluster is cloud-ready, at no extra cost. The same powerful cluster provisioning, monitoring, scheduling and management capabilities that Bright Cluster Manager provides to onsite clusters extend into the cloud. The Bright advantage for cloud utilization Ease of use: Intuitive GUI virtually eliminates user learning curve; no need to understand Linux or EC2 to manage system. Alternatively, cluster management shell provides powerful scripting capabilities to automate tasks. Complete management solution: Installation/initialization, provisioning, monitoring, scheduling and management in one integrated environment. Integrated workload management: Wide selection of work- load managers included and automatically configured with local, cloud and mixed queues. Single pane of glass; complete visibility and control: Cloud computenodesmanagedaselementsoftheon-sitecluster;vi- sible from a single console with drill-downs and historic data. Efficient data management via data-aware scheduling: Auto- matically ensures data is in position at start of computation; delivers results back when complete. Secure, automatic gateway: Mirrored LDAP and DNS services inside Amazon VPC, connecting local and cloud-based nodes for secure communication over VPN. Cost savings: More efficient use of cloud resources; support for spot instances, minimal user intervention. Two cloud scenarios Bright Cluster Manager supports two cloud utilization scenarios: “Cluster-on-Demand” and “Cluster Extension”. Scenario 1: Cluster-on-Demand The Cluster-on-Demand scenario is ideal if you do not have a clus- ter onsite, or need to set up a totally separate cluster. With just a few mouse clicks you can instantly create a complete cluster in the public cloud, for any duration of time.
  • 48. 48 Scenario 2: Cluster Extension The Cluster Extension scenario is ideal if you have a cluster onsite but you need more compute power, including GPUs. With Bright Cluster Manager, you can instantly add EC2-based resources to your onsite cluster, for any duration of time. Extending into the cloud is as easy as adding nodes to an onsite cluster – there are only a few additional, one-time steps after providing the public cloud account information to Bright Cluster Manager. The Bright approach to managing and monitoring a cluster in the cloud provides complete uniformity, as cloud nodes are managed and monitored the same way as local nodes: Load-balanced provisioning Software image management Integrated workload management Interfaces – GUI, shell, user portal Monitoring and health checking Compilers, debuggers, MPI libraries, mathematical libraries and environment modules Advanced Cluster Management Made Easy Cloud Bursting With Bright
  • 49. Bright Computing 49 Bright Cluster Manager also provides additional features that are unique to the cloud. Amazon spot instance support – Bright Cluster Manager enables users to take advantage of the cost savings offered by Amazon’s spot instances. Users can specify the use of spot instances, and Bright will automatically schedule as available, reducing the cost to compute without the need to monitor spot prices and sched- ule manually. Hardware Virtual Machine (HVM) – Bright Cluster Manager auto- matically initializes all Amazon instance types, including Cluster Compute and Cluster GPU instances that rely on HVM virtualization. Amazon VPC – Bright supports Amazon VPC setups which allows compute nodes in EC2 to be placed in an isolated network, there- by separating them from the outside world. It is even possible to route part of a local corporate IP network to a VPC subnet in EC2, so that local nodes and nodes in EC2 can communicate easily. Data-aware scheduling – Data-aware scheduling ensures that input data is automatically transferred to the cloud and made accessible just prior to the job starting, and that the output data is automatically transferred back. There is no need to wait for the data to load prior to submitting jobs (delaying entry into the job queue), nor any risk of starting the job before the data transfer is complete (crashing the job). Users submit their jobs, and Bright’s data-aware scheduling does the rest. Bright Cluster Manager provides the choice: “To Cloud, or Not to Cloud” Not all workloads are suitable for the cloud. The ideal situation for most organizations is to have the ability to choose between onsite clusters for jobs that require low latency communication, complex I/O or sensitive data; and cloud clusters for many other types of jobs. Bright Cluster Manager delivers the best of both worlds: a pow- erful management solution for local and cloud clusters, with the ability to easily extend local clusters into the cloud without compromising provisioning, monitoring or managing the cloud resources. One or more cloud nodes can be configured under the Cloud Settings tab.
  • 50.
  • 51. 51 Intelligent HPC Workload Management While all HPC systems face challenges in workload demand, workflow constraints, resource complexity, and system scale, enterprise HPC systems face more stringent challenges and expectations. Enterprise HPC systems must meet mission-critical and priority HPC workload demands for commercial businesses, business-oriented research, and academic organizations. High Throuhgput Computing | CAD | Big Data Analytics | Simulation | Aerospace | Automotive
  • 52. 52 Solutions to Enterprise High-Performance Computing Challenges The Enterprise has requirements for dynamic scheduling, provi- sioning and management of multi-step/multi-application servic- es across HPC, cloud, and big data environments. Their workloads directly impact revenue, product delivery, and organizational ob- jectives. Enterprise HPC systems must process intensive simulation and data analysis more rapidly, accurately and cost-effectively to accelerate insights. By improving workflow and eliminating job delays and failures, the business can more efficiently leverage data to make data-driven decisions and achieve a competitive advantage. The Enterprise is also seeking to improve resource utilization and manage efficiency across multiple heterogene- ous systems. User productivity must increase to speed the time to discovery, making it easier to access and use HPC resources and expand to other resources as workloads demand. Moab HPC Suite Moab HPC Suite accelerates insights by unifying data center re- sources, optimizing the analysis process, and guaranteeing ser- vices to the business. Moab 8.1 for HPC systems and HPC cloud continually meets enterprise priorities through increased pro- ductivity, automated workload uptime, and consistent SLAs. It uses the battle-tested and patented Moab intelligence engine to automatically balance the complex, mission-critical workload priorities of enterprise HPC systems. Enterprise customers ben- efit from a single integrated product that brings together key Enterprise HPC capabilities, implementation services, and 24/7 support. It is imperative to automate workload workflows to speed time to discovery. The following new use cases play a key role in im- proving overall system performance: Moab is the “brain” of an HPC system, intelligently optimizing workload throughput while balancing service levels and priorities. Intelligent HPC Workload Management Moab HPC Suite
  • 53. 53 Elastic Computing – unifies data center resources by assisting the business management resource expansion through burst- ing to private/public clouds and other data center resources utilizing OpenStack as a common platform Performance and Scale – optimizes the analysis process by dramatically increasing TORQUE and Moab throughput and scalability across the board Tighter cooperation between Moab and Torque – harmoniz- ing these key structures to reduce overhead and improve communication between the HPC scheduler and the re- source manager(s) More Parallelization – increasing the decoupling of Moab and TORQUE’s network communication so Moab is less dependent on TORQUE’S responsiveness, resulting in a 2x speed improvement and shortening the duration of the av- erage scheduling iteration in half Accounting Improvements – allowing toggling between multiple modes of accounting with varying levels of en- forcement, providing greater flexibility beyond strict allo- cation options. Additionally, new up-to-date accounting balances provide real-time insights into usage tracking. Viewpoint Admin Portal – guarantees services to the busi- ness by allowing admins to simplify administrative reporting, workload status tracking, and job resource viewing Accelerate Insights Moab HPC Suite accelerates insights by increasing overall system, user and administrator productivity, achieving more accurate results that are delivered faster and at a lower cost from HPC resources. Moab provides the scalability, 90-99 per- cent utilization, and simple job submission required to maximize productivity. This ultimately speeds the time to discovery, aiding the enterprise in achieving a competitive advantage due to its HPC system. Enterprise use cases and capabilities include the following: OpenStack integration to offer virtual and physical resource provisioning for IaaS and PaaS Performance boost to achieve 3x improvement in overall opti- mization performance Advanced workflow data staging to enable improved cluster utilization, multiple transfer methods, and new transfer types
  • 54. 54 “With Moab HPC Suite, we can meet very demanding customers’ requirements as regards unified management of heterogeneous cluster environments, grid management, and provide them with flexible and powerful configuration and reporting options. Our customers value that highly.” Sven Grützmacher | HPC Solution Engineer Advanced power management with clock frequency control and additional power state options, reducing energy costs by 15-30 percent Workload-optimized allocation policies and provisioning to get more results out of existing heterogeneous resources and reduce costs, including topology-based allocation Unify workload management across heterogeneous clusters by managing them as one cluster, maximizing resource availa- bility and administration efficiency Optimized, intelligent scheduling packs workloads and back- fills around priority jobs and reservations while balancing SLAs to efficiently use all available resources Optimized scheduling and management of accelerators, both Intel Xeon Phi and GPGPUs, for jobs to maximize their utiliza- tion and effectiveness Simplified job submission and management with advanced job arrays and templates Showback or chargeback for pay-for-use so actual resource usage is tracked with flexible chargeback rates and reporting by user, department, cost center, or cluster Multi-cluster grid capabilities manage and share workload across multiple remote clusters to meet growing workload demand or surges Guarantee Services to the Business Job and resource failures in enterprise HPC systems lead to de- layed results, missed organizational opportunities, and missed objectives. Moab HPC Suite intelligently automates workload and resource uptime in the HPC system to ensure that workloads complete successfully and reliably, avoid failures, and guarantee services are delivered to the business. Intelligent HPC Workload Management Moab HPC Suite
  • 55. 55 The Enterprise benefits from these features: Intelligent resource placement prevents job failures with granular resource modeling, meeting workload requirements and avoiding at-risk resources Auto-response to failures and events with configurable actions to pre-failure conditions, amber alerts, or other metrics and monitors Workload-aware future maintenance scheduling that helps maintain a stable HPC system without disrupting workload productivity Real-world expertise for fast time-to-value and system uptime with included implementation, training and 24/7 support remote services Auto SLA Enforcement Moab HPC Suite uses the powerful Moab intelligence engine to optimally schedule and dynamically adjust workload to consist- ently meet service level agreements (SLAs), guarantees, or busi- ness priorities. This automatically ensures that the right work- loads are completed at the optimal times, taking into account the complex number of using departments, priorities, and SLAs to be balanced. Moab provides the following benefits: Usage accounting and budget enforcement that schedules resources and reports on usage in line with resource sharing agreements and precise budgets (includes usage limits, usage reports, auto budget management, and dynamic fair share policies) SLA and priority policies that make sure the highest priority workloads are processed first (includes Quality of Service and hierarchical priority weighting) Continuous plus future scheduling that ensures priorities and guarantees are proactively met as conditions and workload changes (i.e. future reservations, pre-emption) BOX transtec HPC solutions are designed for maximum flexibility and ease of management. We not only offer our customers the most powerful and flexible cluster management solu- tion out there, but also provide them with customized setup and sitespecific configuration. Whether a customer needs a dynamical Linux-Windows dual-boot solution, unified management of different clusters at different sites, or the fine-tuning of the Moab scheduler for implementing fine- grained policy confi guration – transtec not only gives you the framework at hand, but also helps you adapt the system according to your special needs. Needless to say, when cus- tomers are in need of special trainings, transtec will be there to provide customers, administrators, or users with specially adjusted Educational Services. Having a many years’ experience in High Performance Computing enabled us to develop efficient concepts for in- stalling and deploying HPC clusters. For this, we leverage well-proven 3rd party tools, which we assemble to a total solution and adapt and confi gure according to the custom- er’s requirements. We manage everything from the complete installation and configuration of the operating system, necessary middle- ware components like job and cluster management systems up to the customer’s applications.