presentation el cluster0

•Download as PPT, PDF•

0 likes•214 views

Dennis Mungai

Where we’ve come from (~!-
present)
• 16 vCore Hyper-V instance
• 8*2!=16 (Because Hyper-V is a toy)
• Maintenance is a nightmare, downtimes.
• Slow and unpredictable I/O.
• No job scheduling.
• Always short on resources (“Who killed the server?”)
• Little storage (~4 TB? Lol)
• *Revolutionary* at the time (Look, we can haz
Leenucks on HyperV, yes? )
2

At the present and beyond
• Standard HPC cluster components:
• SIMD, MIMD, pipelines
• Tightly coupled shared memory
• NUMA – capable H/W
• Shared high-perf fs (GlusterFS + XFS) w/ scaling
• Head and slaved dedicated compute nodes.
• Pure CentOS 6.5 (NBO) & CentOS 7 (Kilifi)
• HP box(en)
• Many excite. Very win.
4

Primary characteristics:
• Computational capacity
• Data Storage
5

Platform:
• 544 GB DDR3L RAM
• You can request and allocate it in SLURM ;-)
• NUMA scaling and all
• 72 TB storage
• 1/10 GbE interconnects
• 64 Intel Xeon Gen 9 (“Haswell”) CPUs.
6

Homogenous Computing
Environment
7
User IDs, applications, job states and data are available
everywhere.

Scaling out storage with
glusterfs
1.Developed by Redhat
2.Abstracts Back-end storage (file systems, technology,
synchronicity, etc).
3.Can do replicate,distribute, replica+distribute, geo-
replication (off-site deployments), etc.
4.Scales “out”, not “up”.
5.Ideal for clusters.
6.We're using it ;-)
8

How we use GlusterFS
-Persistent paths for homes, data and applications
across the cluster.
-Volumes are replicated (RAID 0 & 1 application –
layer).
- Excellent throughput with near-infinite queue depths.
9

Job scheduling
• Project from Lawrence Livermore National Labs
(LLNL)
• Manages resources
• -Users request CPU, memory and node allocations.
• -Queues & prioritizes jobs.
• Scalable, high performance scheduler, cross-platform
10

How we will use SLURM
• Submit “batch” jobs (that can be long – running,
multiple invocations, varying variables, etc).
• Can run jobs “interactively” (Requiring mouse and
keyboard interaction).
• Makes it easier for users to use clusters effectively,
and in the right way:
[administrator@keklf-cls01 mail]$ interactive
salloc: Granted job allocation 549
[administrator@keklf-cls01 mail]$
11

Managing applications
• Environment modules
• Dynamically load support for packages in a user’s
environment
• Makes it easy to support multiple versions,
complicated dependencies such as $PERL5LIB,
package dependencies, etc.
• Modules explicitly availed by user.
• Run module avail to see what’s available.
12

Managing applications:
• Install once, use everywhere…
• Works everywhere on the cluster!
13

Users and groups.
• Consistent UID/GIDs across systems.
• Microsoft AD + LDAP + Kerberos tickets for sessions
• Mutual process authentication through munge.
• Single logon token.
• Can also use SSH keys ;-)
14

More information & contacts:
• Refer to the wiki: http://keklf-cls01
• Refer to the performance monitor: http://keklf-
cls01/ganglia
• For bioinformatics pipelines, contact Etienne.
• For BioRuby, BioPerl, etc, contact George Githinji
• For compilers and OpenMPI, contact Dennis
Mungai (Me).
15

What's hot

MySQL & noSQL Conference: MySQL for SysadminsJavier Tomas Zon

Integration of Glusterfs in to commvault simpanaGluster.org

GlusterFS w/ Tiered XFS Gluster.org

Deploying Alluxio in the Cloud for Machine LearningAlluxio, Inc.

Redis in 20 minutesAndrás Fehér

TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaOpenNebula Project

Selecting the right persistent storage options for apps in containers Open So...bipin kunal

Red Hat Enterprise Linux: Open, hyperconverged infrastructureRed_Hat_Storage

Data Reduction for Gluster with VDOGluster.org

Accessing gluster ufo_-_eco_willsonGluster.org

Kkeithley ufonfs-gluster summitGluster.org

Openstack platform -Red Hat Pizza and technology event - IsraelArthur Berezin

Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red HatOpenStack

mogpresHiroshi Ono

Cncf meetup kubesprayJuraj Hantak

Nsq & python workerFelinx Lee

Intro to Zenoss by Andrew Kirchbuildacloud

CephFS UpdateCeph Community

Neutrondev pptmarunewby

Alluxio data orchestration for machine learningAlluxio, Inc.

What's hot (20)

MySQL & noSQL Conference: MySQL for Sysadmins

Integration of Glusterfs in to commvault simpana

GlusterFS w/ Tiered XFS

Deploying Alluxio in the Cloud for Machine Learning

Redis in 20 minutes

TechDay - Toronto 2016 - Hyperconvergence and OpenNebula

Selecting the right persistent storage options for apps in containers Open So...

Red Hat Enterprise Linux: Open, hyperconverged infrastructure

Data Reduction for Gluster with VDO

Accessing gluster ufo_-_eco_willson

Kkeithley ufonfs-gluster summit

Openstack platform -Red Hat Pizza and technology event - Israel

Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat

mogpres

Cncf meetup kubespray

Nsq & python worker

Intro to Zenoss by Andrew Kirch

CephFS Update

Neutrondev ppt

Alluxio data orchestration for machine learning

Similar to presentation el cluster0

Spil Storage Platform (Erlang) @ EUG-NLThijs Terlouw

The Linux Block Layer - Built for Fast StorageKernel TLV

MesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructurespierrecdn -

Research computing at ILRIILRI

General Purpose GPU ComputingGlobalLogic Ukraine

Large Scale Computing Infrastructure - NautilusGabriele Di Bernardo

High Performance With Javamalduarte

Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement VMware Tanzu

DConf2015 - Using D for Development of Large Scale Primary StorageLiran Zvibel

Clojure - An Introduction for Lisp Programmerselliando dias

Ceph in the GRNET cloud stackNikos Kormpakis

Shaping the Future: To Globus Compute and Beyond!Globus

Introduction into Ceph storage for OpenStackOpenStack_Online

Kfs presentationPetrovici Florin

Latest (storage IO) patterns for cloud-native applications OpenEBS

Performance Benchmarking: Tips, Tricks, and Lessons LearnedTim Callaghan

Containers and HPCOlli-Pekka Lehto

Preparing OpenSHMEM for Exascaleinside-BigData.com

Sanger OpenStack presentation March 2017Dave Holland

OpenPOWER Acceleration of HPCC SystemsHPCC Systems

presentation el cluster0

1. Research Computing at KWTRP-Kilifi 1 Dennis Mungai (Sysadmin, plumber) KEMRI-Wellcome Trust Kilifi, KE

2. Where we’ve come from (~!- present) • 16 vCore Hyper-V instance • 8*2!=16 (Because Hyper-V is a toy) • Maintenance is a nightmare, downtimes. • Slow and unpredictable I/O. • No job scheduling. • Always short on resources (“Who killed the server?”) • Little storage (~4 TB? Lol) • *Revolutionary* at the time (Look, we can haz Leenucks on HyperV, yes? ) 2

3. Where we came from 3

4. At the present and beyond • Standard HPC cluster components: • SIMD, MIMD, pipelines • Tightly coupled shared memory • NUMA – capable H/W • Shared high-perf fs (GlusterFS + XFS) w/ scaling • Head and slaved dedicated compute nodes. • Pure CentOS 6.5 (NBO) & CentOS 7 (Kilifi) • HP box(en) • Many excite. Very win. 4

5. Primary characteristics: • Computational capacity • Data Storage 5

6. Platform: • 544 GB DDR3L RAM • You can request and allocate it in SLURM ;-) • NUMA scaling and all • 72 TB storage • 1/10 GbE interconnects • 64 Intel Xeon Gen 9 (“Haswell”) CPUs. 6

7. Homogenous Computing Environment 7 User IDs, applications, job states and data are available everywhere.

8. Scaling out storage with glusterfs 1.Developed by Redhat 2.Abstracts Back-end storage (file systems, technology, synchronicity, etc). 3.Can do replicate,distribute, replica+distribute, geo- replication (off-site deployments), etc. 4.Scales “out”, not “up”. 5.Ideal for clusters. 6.We're using it ;-) 8

9. How we use GlusterFS -Persistent paths for homes, data and applications across the cluster. -Volumes are replicated (RAID 0 & 1 application – layer). - Excellent throughput with near-infinite queue depths. 9

10. Job scheduling • Project from Lawrence Livermore National Labs (LLNL) • Manages resources • -Users request CPU, memory and node allocations. • -Queues & prioritizes jobs. • Scalable, high performance scheduler, cross-platform 10

11. How we will use SLURM • Submit “batch” jobs (that can be long – running, multiple invocations, varying variables, etc). • Can run jobs “interactively” (Requiring mouse and keyboard interaction). • Makes it easier for users to use clusters effectively, and in the right way: [administrator@keklf-cls01 mail]$ interactive salloc: Granted job allocation 549 [administrator@keklf-cls01 mail]$ 11

12. Managing applications • Environment modules • Dynamically load support for packages in a user’s environment • Makes it easy to support multiple versions, complicated dependencies such as $PERL5LIB, package dependencies, etc. • Modules explicitly availed by user. • Run module avail to see what’s available. 12

13. Managing applications: • Install once, use everywhere… • Works everywhere on the cluster! 13

14. Users and groups. • Consistent UID/GIDs across systems. • Microsoft AD + LDAP + Kerberos tickets for sessions • Mutual process authentication through munge. • Single logon token. • Can also use SSH keys ;-) 14

15. More information & contacts: • Refer to the wiki: http://keklf-cls01 • Refer to the performance monitor: http://keklf- cls01/ganglia • For bioinformatics pipelines, contact Etienne. • For BioRuby, BioPerl, etc, contact George Githinji • For compilers and OpenMPI, contact Dennis Mungai (Me). 15

16. Finnito ;-) 16

presentation el cluster0

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to presentation el cluster0

Similar to presentation el cluster0 (20)

presentation el cluster0