Your SlideShare is downloading. ×
0
Research Computing at ILRI
Alan Orth
ICT Managers Meeting, ILRI, Kenya, 5 March 2014
Where we came from (2003)
- 32 dual-core compute
nodes
- 32 * 2 != 64
- Writing MPI code is hard!
- Data storage over NFS ...
Where we came from (2010)
- Most of the original cluster
removed
- Replaced with single
Dell PowerEdge R910
- 64 cores, 8T...
To infinity and beyond (2013)
- A little bit back to the
“old” model
- Mixture of “thin” and
“thick” nodes
- Networked sto...
Primary characteristics

Computational
capacity

Data storage
Platform
- 152 compute cores
- 32* TB storage
- 700 GB RAM
- 10 GbE interconnects
- LTO-4 tape backups (LOL?)
Homogeneous computing environment

User IDs, applications, and data are available
everywhere.
Scaling out storage with GlusterFS
- Developed by Red Hat
- Abstracts backend storage (file systems,
technology, etc)
- Ca...
How we use GlusterFS
[aorth@hpc: ~]$ df -h
Filesystem
Size
...
wingu1:/homes
31T
wingu0:/apps
31T
wingu1:/data
31T

Used A...
GlusterFS <3 10GbE
- Project from Lawrence Livermore National Labs (LLNL)
- Manages resources
- Users request CPU, memory, and node allocatio...
Topology
How we use SLURM
- Can submit “batch” jobs (long-running jobs, invoke
program many times with different variables, etc)
- ...
Managing applications
- Environment modules - http://modules.
sourceforge.net
- Dynamically load support for packages in a...
Managing applications
Install once, use everywhere...
[aorth@hpc: ~]$ module avail blast
blast/2.2.25+ blast/2.2.26 blast/...
Users and Groups
- Consistent UID/GIDs across systems
- LDAP + SSSD (also from Red Hat) is a great
match
- 389 LDAP works ...
More information and contact

a.orth@cgiar.org
http://hpc.ilri.cgiar.org/
Upcoming SlideShare
Loading in...5
×

Research computing at ILRI

243

Published on

Presented by Alan Orth at the CGIAR ICT Managers Meeting, ILRI, Kenya, 5 March 2014

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
243
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Research computing at ILRI"

  1. 1. Research Computing at ILRI Alan Orth ICT Managers Meeting, ILRI, Kenya, 5 March 2014
  2. 2. Where we came from (2003) - 32 dual-core compute nodes - 32 * 2 != 64 - Writing MPI code is hard! - Data storage over NFS to “master” node - “Rocks” cluster distro - Revolutionary at the time!
  3. 3. Where we came from (2010) - Most of the original cluster removed - Replaced with single Dell PowerEdge R910 - 64 cores, 8TB storage, 128 GB - Threading is easier* than MPI! - Data is local - Easier to manage!
  4. 4. To infinity and beyond (2013) - A little bit back to the “old” model - Mixture of “thin” and “thick” nodes - Networked storage - Pure CentOS - Supermicro boxen - Pretty exciting! --->
  5. 5. Primary characteristics Computational capacity Data storage
  6. 6. Platform - 152 compute cores - 32* TB storage - 700 GB RAM - 10 GbE interconnects - LTO-4 tape backups (LOL?)
  7. 7. Homogeneous computing environment User IDs, applications, and data are available everywhere.
  8. 8. Scaling out storage with GlusterFS - Developed by Red Hat - Abstracts backend storage (file systems, technology, etc) - Can do replicate, distribute, replicate+distribute, geo-replication (off site!), etc - Scales “out”, not “up”
  9. 9. How we use GlusterFS [aorth@hpc: ~]$ df -h Filesystem Size ... wingu1:/homes 31T wingu0:/apps 31T wingu1:/data 31T Used Avail Use% Mounted on 9.5T 9.5T 9.5T 21T 21T 21T 32% /home 32% /export/apps 32% /export/data - Persistent paths for homes, data, and applications across the cluster. - These volumes are replicated, so essentially application-layer RAID1
  10. 10. GlusterFS <3 10GbE
  11. 11. - Project from Lawrence Livermore National Labs (LLNL) - Manages resources - Users request CPU, memory, and node allocations - Queues / prioritizes jobs, logs usage, etc - More like an accountant than a bouncer
  12. 12. Topology
  13. 13. How we use SLURM - Can submit “batch” jobs (long-running jobs, invoke program many times with different variables, etc) - Can run “interactively” (something that needs keyboard interaction) Make it easy for users to do the “right thing”: [aorth@hpc: ~]$ interactive -c 10 salloc: Granted job allocation 1080 [aorth@compute0: ~]$
  14. 14. Managing applications - Environment modules - http://modules. sourceforge.net - Dynamically load support for packages in a user’s environment - Makes it easy to support multiple versions, complicated packages with $PERL5LIB, package dependencies, etc
  15. 15. Managing applications Install once, use everywhere... [aorth@hpc: ~]$ module avail blast blast/2.2.25+ blast/2.2.26 blast/2.2.26+ blast/2. 2.28+ [aorth@hpc: ~]$ module load blast/2.2.28+ [aorth@hpc: ~]$ which blastn /export/apps/blast/2.2.28+/bin/blastn Works anywhere on the cluster!
  16. 16. Users and Groups - Consistent UID/GIDs across systems - LDAP + SSSD (also from Red Hat) is a great match - 389 LDAP works great with CentOS - SSSD is simpler than pam_ldap and does caching
  17. 17. More information and contact a.orth@cgiar.org http://hpc.ilri.cgiar.org/
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×