3. PDC Center for
High Performance Computing
KTH PDC and Industry – joint projects
PDC is working with industrial researchers and developers on
major international projects that push high-performance
computing to the next level.
15-07-13 3
Stockholm
4. PDC Center for
High Performance Computing
KTH PDC Resources
BESKOW – a Cray XC40 for scalable application
● 1.97 PF TPP and 105TB RAM from 1676 dual-socket nodes.
● Intel E5-2698v3 16 core Haswell CPUs - Cray Aries interconnect
● Largest HPC Resource located in the Nordics
15-07-13 4
Stockholm
5. PDC Center for
High Performance Computing
KTH PDC Resources
TEGNER - Pre/Post Processing cluster
The pre- and post- processing infrastructure aims to support
users with complex workflows and with advanced access
methods including graphics rendering and data exploration.
• Mellanox EDR Infiniband interconnect (1:1 fat tree).
• NVIDIA Quadro K420 for HW-assisted off-screen rendering
• 9 nodes with NVIDIA Tesla K80 for GPU-enabled apps.
• 4 nodes with Intel Xeon Phi 7120
• 55 thin nodes with two Intel-E2670v3 12-core Haswell CPUs
and 512GB RAM.
• Five 1TB RAM nodes (4 Intel E7-8857v2 Ivy Bridge CPUs)
• Five 2TB RAM nodes (4 Intel E7-8857v2 Ivy Bridge CPUs)
15-07-13 5
Stockholm
6. PDC Center for
High Performance Computing
KTH PDC Resources
Klemming – a site-wide high-performance file-system
• A 5 PB Lustre filesystem supplied by DDN
• Based on four DDN SF12KX acting as SRP targets over
FDR Infiniband Point-to-Point links to 16 OSSs. The 16
OSSes acts as 4 redundancy groups.
• Metadata is stored on a DDN EF3015 FC Raid over FC
point-to-point links to a single MDS/MGS fail-over pair.
• LNET routers housed in the target systems are used to
connect the file system to the various resources at PDC.
• An FDR Infiniband 1:1 fat tree fabric is used to connect the
LNET routers to the Lustre servers.
• 132 GB/s to Beskow (meassured) and 20 GB/s to Tegner
(projected).
15-07-13 6
Stockholm
7. PDC Center for
High Performance Computing
PDC Resource use for Partners
• KTH PDC can provide secure compute environment for Industry users in
joint collaboration with PDC
– Tailored solution per customer for long term collaboration with
strategic partners (Shared Investment)
• Reference Customer – Scania Group
– Standard setup for short term needs based on PDC Standards
but with secured nodes and shared data storage (pay-per-use)
15-07-13 7
Stockholm
8. PDC Center for
High Performance Computing
To the point - Scania partnership
15-07-13 8
Stockholm
• Scania is one of the worlds leading manufacturers of heavy
trucks.
• Scania has in-house computational resources but PDC provides
resources for elastic off-loading as well as the possibility of larger-
scale simulations on primarily PDCs Cray XC40 Beskow.
• KTH and Scania has a long-standing strategic partnership and
the Scania-PDC collaboration is one aspect of this partnership.
9. PDC Center for
High Performance Computing
Scania partnership - security
15-07-13 9
Stockholm
• Scania requires a higher level of verifiable confidentiality than
most of PDCs academic users.
• Lingering data and files (persistent state) is assumed to be the
highest risk items in the current setup.
• PDC currently employs only state-less exclusively scheduled
computational nodes which makes the compute platform less of a
problem in this respect – putting the focus on the high
performance filesystem.
• To accommodate for these requirements Scania has a reserved
filesystem which mimics the setup of the main fileystem but on a
smaller scale.
• The design employs Infiniband partitioning to create a strong
separation of data and also to restrict access to the file system
only to those LNET routers which are supposed to reach it.
• This allows sharing the Infiniband connecting the Lustre servers
to the LNET routers.
• The separation can be extended into the compute resource –
depending on the availability of similar mechanism in its
interconnect.
10. PDC Center for
High Performance Computing
Scania file-system – technical aspects
15-07-13 10
Stockholm
• Self-encrypting black-hole warranty drives in a dedicated DDN
SFA7700 SRP solution connected with Infiniband (point-to-point)
to a dedicated fail-over OSS server pair (for file content)
• Dedicated meta-data storage in a DDN EF3015 system
connected with FC (point-to-point) to a dedicated fail-over MDS
server pair (for file and file-system meta-data).
• OSSs and MDSs are connected to a pair of restricted partitions
on the shared Infiniband fabric.
• LNET routers are used to export the file-system to systems where
it is supposed to be used.
• Only routers allowed into the restricted Inifiniband partition can
access the OSS/MDS. Hence the file-system is not accessible
from systems not intended to mount the file-system.