2. Where we’ve come from (~!-
present)
• 16 vCore Hyper-V instance
• 8*2!=16 (Because Hyper-V is a toy)
• Maintenance is a nightmare, downtimes.
• Slow and unpredictable I/O.
• No job scheduling.
• Always short on resources (“Who killed the server?”)
• Little storage (~4 TB? Lol)
• *Revolutionary* at the time (Look, we can haz
Leenucks on HyperV, yes? )
2
6. Platform:
• 544 GB DDR3L RAM
• You can request and allocate it in SLURM ;-)
• NUMA scaling and all
• 72 TB storage
• 1/10 GbE interconnects
• 64 Intel Xeon Gen 9 (“Haswell”) CPUs.
6
8. Scaling out storage with
glusterfs
1.Developed by Redhat
2.Abstracts Back-end storage (file systems, technology,
synchronicity, etc).
3.Can do replicate,distribute, replica+distribute, geo-
replication (off-site deployments), etc.
4.Scales “out”, not “up”.
5.Ideal for clusters.
6.We're using it ;-)
8
9. How we use GlusterFS
-Persistent paths for homes, data and applications
across the cluster.
-Volumes are replicated (RAID 0 & 1 application –
layer).
- Excellent throughput with near-infinite queue depths.
9
10. Job scheduling
• Project from Lawrence Livermore National Labs
(LLNL)
• Manages resources
• -Users request CPU, memory and node allocations.
• -Queues & prioritizes jobs.
• Scalable, high performance scheduler, cross-platform
10
11. How we will use SLURM
• Submit “batch” jobs (that can be long – running,
multiple invocations, varying variables, etc).
• Can run jobs “interactively” (Requiring mouse and
keyboard interaction).
• Makes it easier for users to use clusters effectively,
and in the right way:
[administrator@keklf-cls01 mail]$ interactive
salloc: Granted job allocation 549
[administrator@keklf-cls01 mail]$
11
12. Managing applications
• Environment modules
• Dynamically load support for packages in a user’s
environment
• Makes it easy to support multiple versions,
complicated dependencies such as $PERL5LIB,
package dependencies, etc.
• Modules explicitly availed by user.
• Run module avail to see what’s available.
12
14. Users and groups.
• Consistent UID/GIDs across systems.
• Microsoft AD + LDAP + Kerberos tickets for sessions
• Mutual process authentication through munge.
• Single logon token.
• Can also use SSH keys ;-)
14
15. More information & contacts:
• Refer to the wiki: http://keklf-cls01
• Refer to the performance monitor: http://keklf-
cls01/ganglia
• For bioinformatics pipelines, contact Etienne.
• For BioRuby, BioPerl, etc, contact George Githinji
• For compilers and OpenMPI, contact Dennis
Mungai (Me).
15