SlideShare a Scribd company logo
1 of 16
Research Computing at
KWTRP-Kilifi
1
Dennis Mungai
(Sysadmin, plumber)
KEMRI-Wellcome Trust
Kilifi, KE
Where we’ve come from (~!-
present)
• 16 vCore Hyper-V instance
• 8*2!=16 (Because Hyper-V is a toy)
• Maintenance is a nightmare, downtimes.
• Slow and unpredictable I/O.
• No job scheduling.
• Always short on resources (“Who killed the server?”)
• Little storage (~4 TB? Lol)
• *Revolutionary* at the time (Look, we can haz
Leenucks on HyperV, yes? )
2
Where we came from
3
At the present and beyond
• Standard HPC cluster components:
• SIMD, MIMD, pipelines
• Tightly coupled shared memory
• NUMA – capable H/W
• Shared high-perf fs (GlusterFS + XFS) w/ scaling
• Head and slaved dedicated compute nodes.
• Pure CentOS 6.5 (NBO) & CentOS 7 (Kilifi)
• HP box(en)
• Many excite. Very win.
4
Primary characteristics:
• Computational capacity
• Data Storage
5
Platform:
• 544 GB DDR3L RAM
• You can request and allocate it in SLURM ;-)
• NUMA scaling and all
• 72 TB storage
• 1/10 GbE interconnects
• 64 Intel Xeon Gen 9 (“Haswell”) CPUs.
6
Homogenous Computing
Environment
7
User IDs, applications, job states and data are available
everywhere.
Scaling out storage with
glusterfs
1.Developed by Redhat
2.Abstracts Back-end storage (file systems, technology,
synchronicity, etc).
3.Can do replicate,distribute, replica+distribute, geo-
replication (off-site deployments), etc.
4.Scales “out”, not “up”.
5.Ideal for clusters.
6.We're using it ;-)
8
How we use GlusterFS
-Persistent paths for homes, data and applications
across the cluster.
-Volumes are replicated (RAID 0 & 1 application –
layer).
- Excellent throughput with near-infinite queue depths.
9
Job scheduling
• Project from Lawrence Livermore National Labs
(LLNL)
• Manages resources
• -Users request CPU, memory and node allocations.
• -Queues & prioritizes jobs.
• Scalable, high performance scheduler, cross-platform
10
How we will use SLURM
• Submit “batch” jobs (that can be long – running,
multiple invocations, varying variables, etc).
• Can run jobs “interactively” (Requiring mouse and
keyboard interaction).
• Makes it easier for users to use clusters effectively,
and in the right way:
[administrator@keklf-cls01 mail]$ interactive
salloc: Granted job allocation 549
[administrator@keklf-cls01 mail]$
11
Managing applications
• Environment modules
• Dynamically load support for packages in a user’s
environment
• Makes it easy to support multiple versions,
complicated dependencies such as $PERL5LIB,
package dependencies, etc.
• Modules explicitly availed by user.
• Run module avail to see what’s available.
12
Managing applications:
• Install once, use everywhere…
• Works everywhere on the cluster!
13
Users and groups.
• Consistent UID/GIDs across systems.
• Microsoft AD + LDAP + Kerberos tickets for sessions
• Mutual process authentication through munge.
• Single logon token.
• Can also use SSH keys ;-)
14
More information & contacts:
• Refer to the wiki: http://keklf-cls01
• Refer to the performance monitor: http://keklf-
cls01/ganglia
• For bioinformatics pipelines, contact Etienne.
• For BioRuby, BioPerl, etc, contact George Githinji
• For compilers and OpenMPI, contact Dennis
Mungai (Me).
15
Finnito ;-)
16

More Related Content

What's hot

MySQL & noSQL Conference: MySQL for Sysadmins
MySQL & noSQL Conference: MySQL for SysadminsMySQL & noSQL Conference: MySQL for Sysadmins
MySQL & noSQL Conference: MySQL for SysadminsJavier Tomas Zon
 
Integration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpanaIntegration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpanaGluster.org
 
GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS  GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS Gluster.org
 
Deploying Alluxio in the Cloud for Machine Learning
Deploying Alluxio in the Cloud for Machine LearningDeploying Alluxio in the Cloud for Machine Learning
Deploying Alluxio in the Cloud for Machine LearningAlluxio, Inc.
 
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaTechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaOpenNebula Project
 
Selecting the right persistent storage options for apps in containers Open So...
Selecting the right persistent storage options for apps in containers Open So...Selecting the right persistent storage options for apps in containers Open So...
Selecting the right persistent storage options for apps in containers Open So...bipin kunal
 
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructureRed Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructureRed_Hat_Storage
 
Data Reduction for Gluster with VDO
Data Reduction for Gluster with VDOData Reduction for Gluster with VDO
Data Reduction for Gluster with VDOGluster.org
 
Accessing gluster ufo_-_eco_willson
Accessing gluster ufo_-_eco_willsonAccessing gluster ufo_-_eco_willson
Accessing gluster ufo_-_eco_willsonGluster.org
 
Kkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summitKkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summitGluster.org
 
Openstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - IsraelOpenstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - IsraelArthur Berezin
 
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red HatHyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red HatOpenStack
 
Cncf meetup kubespray
Cncf meetup kubesprayCncf meetup kubespray
Cncf meetup kubesprayJuraj Hantak
 
Nsq & python worker
Nsq & python workerNsq & python worker
Nsq & python workerFelinx Lee
 
Intro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew KirchIntro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew Kirchbuildacloud
 
Neutrondev ppt
Neutrondev pptNeutrondev ppt
Neutrondev pptmarunewby
 
Alluxio data orchestration for machine learning
Alluxio data orchestration for machine learningAlluxio data orchestration for machine learning
Alluxio data orchestration for machine learningAlluxio, Inc.
 

What's hot (20)

MySQL & noSQL Conference: MySQL for Sysadmins
MySQL & noSQL Conference: MySQL for SysadminsMySQL & noSQL Conference: MySQL for Sysadmins
MySQL & noSQL Conference: MySQL for Sysadmins
 
Integration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpanaIntegration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpana
 
GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS  GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS
 
Deploying Alluxio in the Cloud for Machine Learning
Deploying Alluxio in the Cloud for Machine LearningDeploying Alluxio in the Cloud for Machine Learning
Deploying Alluxio in the Cloud for Machine Learning
 
Redis in 20 minutes
Redis in 20 minutesRedis in 20 minutes
Redis in 20 minutes
 
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaTechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
 
Selecting the right persistent storage options for apps in containers Open So...
Selecting the right persistent storage options for apps in containers Open So...Selecting the right persistent storage options for apps in containers Open So...
Selecting the right persistent storage options for apps in containers Open So...
 
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructureRed Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
 
Data Reduction for Gluster with VDO
Data Reduction for Gluster with VDOData Reduction for Gluster with VDO
Data Reduction for Gluster with VDO
 
Accessing gluster ufo_-_eco_willson
Accessing gluster ufo_-_eco_willsonAccessing gluster ufo_-_eco_willson
Accessing gluster ufo_-_eco_willson
 
Kkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summitKkeithley ufonfs-gluster summit
Kkeithley ufonfs-gluster summit
 
Openstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - IsraelOpenstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - Israel
 
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red HatHyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
 
mogpres
mogpresmogpres
mogpres
 
Cncf meetup kubespray
Cncf meetup kubesprayCncf meetup kubespray
Cncf meetup kubespray
 
Nsq & python worker
Nsq & python workerNsq & python worker
Nsq & python worker
 
Intro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew KirchIntro to Zenoss by Andrew Kirch
Intro to Zenoss by Andrew Kirch
 
CephFS Update
CephFS UpdateCephFS Update
CephFS Update
 
Neutrondev ppt
Neutrondev pptNeutrondev ppt
Neutrondev ppt
 
Alluxio data orchestration for machine learning
Alluxio data orchestration for machine learningAlluxio data orchestration for machine learning
Alluxio data orchestration for machine learning
 

Similar to presentation el cluster0

Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLThijs Terlouw
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageKernel TLV
 
MesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructures
MesosCon EU 2017 - Criteo - Operating Mesos-based InfrastructuresMesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructures
MesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructurespierrecdn -
 
Research computing at ILRI
Research computing at ILRIResearch computing at ILRI
Research computing at ILRIILRI
 
Large Scale Computing Infrastructure - Nautilus
Large Scale Computing Infrastructure - NautilusLarge Scale Computing Infrastructure - Nautilus
Large Scale Computing Infrastructure - NautilusGabriele Di Bernardo
 
High Performance With Java
High Performance With JavaHigh Performance With Java
High Performance With Javamalduarte
 
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement VMware Tanzu
 
DConf2015 - Using D for Development of Large Scale Primary Storage
DConf2015 - Using D for Development  of Large Scale Primary StorageDConf2015 - Using D for Development  of Large Scale Primary Storage
DConf2015 - Using D for Development of Large Scale Primary StorageLiran Zvibel
 
Clojure - An Introduction for Lisp Programmers
Clojure - An Introduction for Lisp ProgrammersClojure - An Introduction for Lisp Programmers
Clojure - An Introduction for Lisp Programmerselliando dias
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stackNikos Kormpakis
 
Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!Globus
 
Introduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackIntroduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackOpenStack_Online
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications OpenEBS
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedTim Callaghan
 
Preparing OpenSHMEM for Exascale
Preparing OpenSHMEM for ExascalePreparing OpenSHMEM for Exascale
Preparing OpenSHMEM for Exascaleinside-BigData.com
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 

Similar to presentation el cluster0 (20)

Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NL
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
MesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructures
MesosCon EU 2017 - Criteo - Operating Mesos-based InfrastructuresMesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructures
MesosCon EU 2017 - Criteo - Operating Mesos-based Infrastructures
 
Research computing at ILRI
Research computing at ILRIResearch computing at ILRI
Research computing at ILRI
 
General Purpose GPU Computing
General Purpose GPU ComputingGeneral Purpose GPU Computing
General Purpose GPU Computing
 
Large Scale Computing Infrastructure - Nautilus
Large Scale Computing Infrastructure - NautilusLarge Scale Computing Infrastructure - Nautilus
Large Scale Computing Infrastructure - Nautilus
 
High Performance With Java
High Performance With JavaHigh Performance With Java
High Performance With Java
 
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
 
DConf2015 - Using D for Development of Large Scale Primary Storage
DConf2015 - Using D for Development  of Large Scale Primary StorageDConf2015 - Using D for Development  of Large Scale Primary Storage
DConf2015 - Using D for Development of Large Scale Primary Storage
 
Clojure - An Introduction for Lisp Programmers
Clojure - An Introduction for Lisp ProgrammersClojure - An Introduction for Lisp Programmers
Clojure - An Introduction for Lisp Programmers
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stack
 
Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!
 
Introduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackIntroduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStack
 
Kfs presentation
Kfs presentationKfs presentation
Kfs presentation
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
 
Containers and HPC
Containers and HPCContainers and HPC
Containers and HPC
 
Preparing OpenSHMEM for Exascale
Preparing OpenSHMEM for ExascalePreparing OpenSHMEM for Exascale
Preparing OpenSHMEM for Exascale
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 

presentation el cluster0

  • 1. Research Computing at KWTRP-Kilifi 1 Dennis Mungai (Sysadmin, plumber) KEMRI-Wellcome Trust Kilifi, KE
  • 2. Where we’ve come from (~!- present) • 16 vCore Hyper-V instance • 8*2!=16 (Because Hyper-V is a toy) • Maintenance is a nightmare, downtimes. • Slow and unpredictable I/O. • No job scheduling. • Always short on resources (“Who killed the server?”) • Little storage (~4 TB? Lol) • *Revolutionary* at the time (Look, we can haz Leenucks on HyperV, yes? ) 2
  • 3. Where we came from 3
  • 4. At the present and beyond • Standard HPC cluster components: • SIMD, MIMD, pipelines • Tightly coupled shared memory • NUMA – capable H/W • Shared high-perf fs (GlusterFS + XFS) w/ scaling • Head and slaved dedicated compute nodes. • Pure CentOS 6.5 (NBO) & CentOS 7 (Kilifi) • HP box(en) • Many excite. Very win. 4
  • 5. Primary characteristics: • Computational capacity • Data Storage 5
  • 6. Platform: • 544 GB DDR3L RAM • You can request and allocate it in SLURM ;-) • NUMA scaling and all • 72 TB storage • 1/10 GbE interconnects • 64 Intel Xeon Gen 9 (“Haswell”) CPUs. 6
  • 7. Homogenous Computing Environment 7 User IDs, applications, job states and data are available everywhere.
  • 8. Scaling out storage with glusterfs 1.Developed by Redhat 2.Abstracts Back-end storage (file systems, technology, synchronicity, etc). 3.Can do replicate,distribute, replica+distribute, geo- replication (off-site deployments), etc. 4.Scales “out”, not “up”. 5.Ideal for clusters. 6.We're using it ;-) 8
  • 9. How we use GlusterFS -Persistent paths for homes, data and applications across the cluster. -Volumes are replicated (RAID 0 & 1 application – layer). - Excellent throughput with near-infinite queue depths. 9
  • 10. Job scheduling • Project from Lawrence Livermore National Labs (LLNL) • Manages resources • -Users request CPU, memory and node allocations. • -Queues & prioritizes jobs. • Scalable, high performance scheduler, cross-platform 10
  • 11. How we will use SLURM • Submit “batch” jobs (that can be long – running, multiple invocations, varying variables, etc). • Can run jobs “interactively” (Requiring mouse and keyboard interaction). • Makes it easier for users to use clusters effectively, and in the right way: [administrator@keklf-cls01 mail]$ interactive salloc: Granted job allocation 549 [administrator@keklf-cls01 mail]$ 11
  • 12. Managing applications • Environment modules • Dynamically load support for packages in a user’s environment • Makes it easy to support multiple versions, complicated dependencies such as $PERL5LIB, package dependencies, etc. • Modules explicitly availed by user. • Run module avail to see what’s available. 12
  • 13. Managing applications: • Install once, use everywhere… • Works everywhere on the cluster! 13
  • 14. Users and groups. • Consistent UID/GIDs across systems. • Microsoft AD + LDAP + Kerberos tickets for sessions • Mutual process authentication through munge. • Single logon token. • Can also use SSH keys ;-) 14
  • 15. More information & contacts: • Refer to the wiki: http://keklf-cls01 • Refer to the performance monitor: http://keklf- cls01/ganglia • For bioinformatics pipelines, contact Etienne. • For BioRuby, BioPerl, etc, contact George Githinji • For compilers and OpenMPI, contact Dennis Mungai (Me). 15