Successfully reported this slideshow.

Linux hpc-cluster-setup-guide

10,087 views

Published on

Published in: Technology

Linux hpc-cluster-setup-guide

  1. 1. Guide to Building your Linux High-performance Cluster Edmund Ochieng March 2, 2012 1
  2. 2. Abstract In modern day where computer simulation forms a critical part inresearch, high-performance clusters have become a need in about everyeducational or research institution. This paper aims to give you the instructions you need to setup yourpersonal computer. So if you are looking forward to setting up a cluster,this is the guide for you. This guide is prepared with climate simulation in mind. However, be-sides the software required for climate simualtion, steps required to setupthe cluster remain more or less the same. The setup aims to grant you the ability to run modelling, simulationand visualisation applications across multiple processors. Probably morethan you can have in a single server unit. 2
  3. 3. ContentsI Master node Configuration 51 Network configuration 6 1.1 Internal interface configuration . . . . . . . . . . . . . . . . . . . 6 1.2 External interface configuration . . . . . . . . . . . . . . . . . . . 62 MAC address acquisition 6 2.1 System Documentation / Manuals . . . . . . . . . . . . . . . . . 7 2.2 Netwotk Traffic Monitoring . . . . . . . . . . . . . . . . . . . . . 7 2.3 TFTP Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 73 DHCP configuration 94 Local Repository 115 EPEL Repository 116 NFS configuration 127 SSH Key Generation Script 13II Software and Compiler installation and configura-tion 148 Torque configuration 159 Maui configuration 1910 Compiler Installation 21 10.1 GCC Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 10.2 Intel Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2111 OpenMPI installation 21 11.1 OpenMPI Compiled with GCC Compilers . . . . . . . . . . . . . 22 11.2 OpenMPI Compiled with Intel Compilers . . . . . . . . . . . . . 2212 Environment Modules installation 2213 C3 Tools installation 2314 Password Syncing 2415 NetCDF, HDF5 and GrADs installation 2416 NCL and NCO installation 2517 R Statistical package installation 25 3
  4. 4. III Computing Node Installation 2618 Node OS installtion 2719 Name resolution 28 4
  5. 5. Part IMaster node Configuration 5
  6. 6. 1 Network configuration1.1 Internal interface configurationSet the network interface through which the DHCP service will listen for IPaddress request to be static and to start on system boot up. This is shouldappear similar to the configurations below. 1. With a text editor of your choice, edit your master node network config- uration for the network interface to be used to communicate with other nodes in your cluster. [root@master ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 # Broadcom Corporation NetXtreme BCM5715 Gigabit Ethernet DEVICE=eth0 #BOOTPROTO=dhcp BOOTPROTO=static HWADDR=00:16:36:E7:8B:A3 IPADDR=192.168.10.1 NETMASK=255.255.255.0 ONBOOT=yes DHCP_HOSTNAME=master.cluster 2. Once the changes have been made, you can save the file and start the interface. 3. Finally, you should invoke, the ifconfig instruction to confirm the settings are active as illustrated below. [root@master ~]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:16:36:E7:8B:A3 inet addr:192.168.10.1 Bcast:192.168.10.127 Mask:255.255.255.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:74 Memory:fdfc0000-fdfd00001.2 External interface configurationThe eth1 interface shall be connected to the organizational network and willacquire network configuration via DHCP. So to have the inetrface working, allthat needs to be done is to set the ONBOOT option in /etc/sysconfig/network-scripts/ifcfg-eth1 and connect a cable to the interface.2 MAC address acquisitionThe MAC address acquisition step is important as it allows the master node touniquely identify the nodes that make up the cluster and as a result give themcustomized configuration. 6
  7. 7. Each network interface has a unique MAC address which can be obtainedeither from the system manuals/documentation or from listening to the networktraffic from the master node interface on which the dhcp shall be listening on.2.1 System Documentation / ManualsThis could either be on the hardware such as is the case on Sun servers and acouple of HP servers I’ve seen or on the booklets provided alongside the server.However, this could at times be deceiving. If that is the case, you could alwayslisten on the network to obtain the desired MAC address.2.2 Netwotk Traffic MonitoringUsing the tcpdump command, we can acquire the hardware interfaces’ MACaddress. For easy identification, each node should be turned on at any giventime during the MAC address collection process. From the tcpdump output below, we can identify the network interface MACaddress of the first node as 00:1b:24:3d:f1:a3 since the column just before thesecond ”greater than” symbol is 0.0.0.0.68 - which basically means it has no ipaddress and expects a response on UDP port 68.[root@master ~]# tcpdump -i eth0 -nn -qtep port bootpc and port bootps and ip broadcasttcpdump: verbose output suppressed, use -v or -vv for full protocoldecode listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes00:1b:24:3d:f1:a3 > ff:ff:ff:ff:ff:ff, IPv4, length 590: 0.0.0.0.68 >255.255.255.255.67: UDP, length 54800:16:36:e7:8b:a3 > ff:ff:ff:ff:ff:ff, IPv4, length 342: 192.168.10.1.67 > 255.255.255.255.68: UDP, length 300 Repeat the above process for all nodes to which you would like to issue staticIP addresses.2.3 TFTP ConfigurationThe TFTP service is trivial for a PXE server to work as they serve provide anetinstall kernel and a ramdisk to the clients when they attempt to do a networkboot. By default, tftp which is part of xinetd.d is disabled. You can have it enabledby opening the configuration file and changing the value of the option ”disabled”from yes to no. Your completed configuration file should be similar to the oneshown below 1. Enable tftp which is part of the xinetd stack [root@master ~]# vi /etc/xinetd.d/tftp [root@master ~]# cat /etc/xinetd.d/tftp # default: off service tftp 7
  8. 8. { socket_type = dgram protocol = udp wait = yes user = root server = /usr/sbin/in.tftpd server_args = -s /tftpboot disable = no per_source = 11 cps = 100 2 flags = IPv4 }2. Once done, restart the service xinetd to start tftp alongside other services on the next start. [root@master ~]# service xinetd restart Stopping xinetd: [ OK ] Starting xinetd: [ OK ]3. Check if a tftpboot directory has been created on the root directory tree as is shown below [root@master ~]# file /tftpboot/ /tftpboot/: directory4. Create a directory tree into which the pxe files shall be placed. [root@master ~]# mkdir -p /tftpboot/pxe/pxelinux.cfg5. Copy the netboot kernel image and an initial ramdisk. [root@master ~]# ls /distro/centos/images/pxeboot/ initrd.img README TRANS.TBL vmlinuz [root@master ~]# cp /distro/centos/images/pxeboot/{vmlinuz, initrd.img} /tftpboot/pxe/6. Locate the pxelinux.0 file and copy to the /tftpboot/pxe directory from where it should be accessible via tftp daemon. [root@master ~]# locate pxelinux.0 /usr/lib/syslinux/pxelinux.0 [root@master ~]# cp -av /usr/lib/syslinux/pxelinux.0 /tftpboot/pxe/ ‘/usr/lib/syslinux/pxelinux.0’ -> ‘/tftpboot/pxe/pxelinux.0’ NOTE: Keenly note the location of the pxelinux.0 file as its relative path(i.e. from the tftp root directory - /tftpboot) will be used in the DHCP daemon configuration section.7. Create a default boot configuration file for machines that may not have a specific boot file in the pxelinux.cfg directory. 8
  9. 9. [root@master ~]# vi /tftpboot/pxe/pxelinux.cfg/default [root@master ~]# cat /tftpboot/pxe/pxelinux.cfg/default # /tftpboot/pxe/pxelinux.cfg/default prompt 1 timeout 100 default local label local LOCALBOOT 0 label install kernel vmlinuz append initrd=initrd.img network ip=dhcp lang=en US keymap=us ksdevice=eth0 ks=http://192.168.10.1/ks/node-ks.cfg loadramdisk=1 prompt_ramdisk=0 ramdisksize=16384 vga=normal selinux=0 8. Get the hexadecimal equivalent of the nodes ip address used to creat a per client pxe configuration. [root@master pxelinux.cfg]# gethostip node01 node01 192.168.10.2 C0A80A02 [root@master pxelinux.cfg]# cp default C0A80A02 9. Copy the default file to a file with the hex equivalent obtained above. Open the file and change the line default local to default install. This should commence installation on rebooting node01. The same should be done for all other nodes. [root@master ~]# cp /tftpboot/pxe/pxelinux.cfg/default /tftpbo ot/pxe/pxelinux.cfg/C0A80A023 DHCP configurationTo issue static ip addresses via the DHCP daemon, the network interface hard-ware(or MAC) addresses collected in the MAC address collection section willbe necessary. DHCP daemon configuration for the cluster should carried out as outlined inthe steps below. 1. Enter the name of the interface through which the DHCP daemon will be listening on. [root@master ~]# cat /etc/sysconfig/dhcpd # Command line options here DHCPDARGS="eth0" 2. Create your DHCP configuration file, from the sample file in the location below. 9
  10. 10. [root@master ~]# cp /usr/share/doc/dhcp-3.0.5/dhcpd.conf.sample /etc/dhcpd.conf cp: overwrite ‘/etc/dhcpd.conf’? y3. You could edit your your configurations to look more or less like my con- figurations issuing addresses to desired hosts using their MAC addresses as illustrated below. [root@master ~]# cat /etc/dhcpd.conf ddns-update-style interim; ignore client-updates; allow booting; allow bootp; subnet 192.168.10.0 netmask 255.255.255.0 { # --- default gateway # option routers 192.168.0.1; option subnet-mask 255.255.255.0; # option nis-domain "domain.org"; option domain-name "cluster"; option domain-name-servers 192.168.10.1; option time-offset 10800; # EAT # option ntp-servers 192.168.1.1; # option netbios-name-servers 192.168.1.1; # range dynamic-bootp 192.168.10.4 192.168.10.20; default-lease-time 21600; max-lease-time 43200; filename "pxe/pxelinux.0"; next-server 192.168.10.1; # we want the nameserver to appear at a fixed address host node01 { hardware ethernet 00:1b:24:3d:f1:a3; fixed-address 192.168.10.2; option host-name "node01"; } host node02 { hardware ethernet 00:1b:24:3e:05:d1; fixed-address 192.168.10.3; option host-name "node02"; } host node03 { hardware ethernet 00:1b:24:3e:04:f6; fixed-address 192.168.10.4; option host-name "node03"; } }4. Finally, save the configuration file and start the server. 10
  11. 11. [root@master ~]# service dhcpd start Starting dhcpd: [ OK ] 5. Should the starting of DHCP daemon fail, you could look at the logs at /var/logs/messages and identify any DHCP daemon related errors. This could be done using the GNU/Linux editor but for better troubleshooting, I’d proceed as below. [root@master ~]# tail -f /var/log/messages4 Local RepositoryA local repository is very crucial in cases of poor Internet connectivity. 1. Create a directory on the system and make it copy all the contents of the installation disk into it. [root@master ~]# mkdir -p /distro/centos [root@master ~]# cp -ar /media/CentOS_5.6_Final/* /distro/centos 2. Create a new repository file that would point to the location created above. [root@master ~]# cat /etc/yum.repos.d/CentOS-Local.repo [Local] name=CentOS- - Local baseurl=file:///distro/centos gpgcheck=0 enabled=1 3. Clear the cache and any other repository information saved locally [root@master ~]# yum clean all 4. Make a cache of the new available repositories. [root@master ~]# yum makecache5 EPEL RepositoryThe addition of the EPEL(Extraa Packages for Enterprise Linux) repositorywas crucial in the facilitation of the installation of some of the software neededin the cluster and which installation from source was not quite a simple process.These are such as: 1. R - R Statistical package http://www.r-project.org/ 2. NCO - NetCDF Operator http://nco.sourceforge.net/ 3. CDO - Climate Data Operators 4. NCL - NCAR Command Language http://www.ncl.ucar.edu/Applications/rcm.shtml 11
  12. 12. 5. GrADS - Grid Analysis and Display System http://www.iges.org/This is done as illustrated below:[root@master ~]# rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-release-5-4.noarch.rpmRetrieving http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-release-5-4.noarch.rpmwarning: /var/tmp/rpm-xfer.Ln8ILG: Header V3 DSA signature: NOKEY, key ID 217521f6Preparing... ########################################### [100%] 1:epel-release ########################################### [100%]6 NFS configurationWe shall export some of the master node’s filesystem to reduce the need forrepetitive configuration. 1. Populate the /etc/exports configuration file with the directories you’d wish to have exported via nfs. [root@master ~]# vi /etc/exports /distro *(ro,root_squash) /home *(rw,root_squash) /distro/centos *(ro,root_squash) /distro/ks *(ro,root_squash) /opt *(ro,root_squash) /usr/local *(ro,root_squash) /scratch *(rw,root_squash) 2. Start the nfs daemon. Which should start succesfully should your config- urations. [root@master ~]# service nfs start Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS daemon: [ OK ] Starting NFS mountd: [ OK ] 3. Make the nfs daemon to autostart without on system start up. [root@master ~]# chkconfig nfs on [root@master ~]# exportfs -vra exporting *:/distro/centos exporting *:/distro/ks exporting *:/usr/local exporting *:/scratch exporting *:/distro exporting *:/home exporting *:/opt 12
  13. 13. 7 SSH Key Generation ScriptTo allow jobs to be succesfully submitted to the cluster, passwordless ssh loginshould be possible for all users on the cluster. So the script below will createa key pair and copy it over to the authorized keys file in the .ssh/ directory ineach users home directory. This shall be automated by the script below which we shall place in system-wide /etc/profile.d directory.[root@master modulefiles]# cat /etc/profile.d/passwordless-ssh.sh Listing 1: /etc/profile.d/passwordless-ssh.sh#! / b i n / b a s h## / e t c / p r o f i l e . d/ p a s s w o r d l e s s −s s h . sh#i f [ ! −d ” $ {HOME} ” / . s s h / −o ! −f ” $ {HOME} ” / . s s h / i d d s a . pub ]then echo −ne ” G e n e r a t i n g s s h k e y s : t ” ssh−keygen −t dsa −N ” ” −f ” $ {HOME} ” / . s s h / i d d s a i f [ ” $ ? ” −eq 0 ] ; then echo −e ” [ 0 3 3 [ 3 2 ; 1m done 0 3 3 [ 0m] ” ; c a t ” $ {HOME} ” / . s s h / i d d s a . pub >> ” $ {HOME} ” / . s s h / a u t h o r i z e d k e y s chmod −R u+rwX , go= ” $ {HOME} ” / . s s h / else echo −e ” [ 0 3 3 [ 3 5 ; 1m f a i l e d 0 3 3 [ 0m] ” fifi 13
  14. 14. Part IISoftware and Compilerinstallation and configuration 14
  15. 15. 8 Torque configuration 1. Untar the source and execute the configure script with the following below. [root@master src]# tar xvfz torque-2.4.14.tar.gz [root@master src]# cd torque-2.4.14 [root@master torque-2.4.14]# mkdir build [root@master torque-2.4.14]# cd build [root@master build]# ../configure --help [root@master build]# ../configure --prefix=/opt/torque -- enable-server --enable-mom --enable-clients --disable-gui --with-rcp=scp 2. Compile the code to create binary files by executing ”make”, followed by ”make install” to install the binaries. [root@master build]# make [root@master build]# make install 3. Add the path for the sbin directory to the root user’s .bashrc file. [root@master torque-2.4.14]# echo "export PATH=/opt/torqu e/sbin:$PATH" >> /root/.bashrc [root@master torque-2.4.14]# tail -n 1 ~/.bashrc export PATH=/opt/torque/sbin:$PATH 4. Copy the pbs mom script in the contrib/init.d directory of the installation source /opt/torque/pbs mom.init. Open the file in an editor of your choice and ammend any erroneous paths. [root@master torque-2.4.14]# cp contrib/init.d/pbs_mom /opt/torque/pbs_mom.init [root@master torque-2.4.14]# vi /opt/torque/pbs_mom.init 5. Copy the node install.sh script into the torque install directory. It will be used to install pbs mom on the computing nodes. Listing 2: node install.sh #! / b i n / b a s h # / o p t / t o r q u e / n o d e i n s t a l l . sh # h t t p : / / e p i c o . e s c i e n c e −l a b . o r g # mailto : baro@democritos . i t TORQUEHOME=/opt / t o r q u e / TORQUEBIN=$TORQUEHOME/ b i n MAUIBIN=/opt / maui / b i n SPOOL=/v a r / s p o o l / t o r q u e mkdir −vp $SPOOL cd $SPOOL | | exit #===========================================================# 15
  16. 16. mkdir −vp aux mom priv / j o b s mom logs c h e c k p o i n t s p o o l undeliveredchmod −v 1777 s p o o l u n d e l i v e r e dfor s in prologue epiloguedo t e s t −e $TORQUEHOME/ s c r i p t s / $ s && l n −sv $TORQUEHOME/ s c r i p t s / $ s $SPOOL/ mom priv /done#===========================================================#c a t << EOF > p b s e n v i r o n m e n tPATH=/b i n : / u s r / b i nLANG =C EOF#===========================================================#echo master > s e r v e r n a m e#===========================================================#c a t << EOF > mom priv / c o n f i g $clienthost master $logevent 0 x7f $usecp ∗ : / u /u $usecp ∗ : / home /home $usecp ∗:/ scratch / scratch EOF#===========================================================#MOM INIT=/ e t c / i n i t . d/pbs momcp −va / opt / t o r q u e /pbs mom . i n i t $MOM INITchmod +x $MOM INITc h k c o n f i g −−add pbs momc h k c o n f i g pbs mom on# i n c r e a s e l i m i t s f o r i n f i n i b a n d s t u f f ( pbs mom i s NOT p a m l i m i t s aware )e g r e p ’ u l i m i t [ [ : s p a c e : ] ] + . ∗ − l [ [ : s p a c e : ] ] ’ $MOM INIT | | p e r l −e ’w h i l e (<>) { print ; i f ( / ˆ [ t ]+ s t a r t ) / ) { p r i n t << EOF ; # −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−# # i n c r e a s e l i m i t s f o r i n f i n i b a n d s t u f f ( no−p a m l i m i t s − aware ) # max l o c k e d memory , s o f t and hard l i m i t s f o r a l l PBS children u l i m i t −H − l u n l i m i t e d u l i m i t −S − l 4096000 # s t a c k s i z e , s o f t and hard l i m i t s f o r a l l PBS children u l i m i t −H −s u n l i m i t e d 16
  17. 17. u l i m i t −S −s 1024000 #−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−# EOF } } ’ − i $MOM INIT #===========================================================# c a t << EOF > / e t c / p r o f i l e . d/ pbs . sh e x p o r t PATH=$TORQUEBIN: $MAUIBIN : $PATH EOF #EOF6. In an editor of your choice, enter the fully qualified domain name of your master node in the file below. [root@master torque-2.4.14]# vi /var/spool/torque/server_name master.cluster7. Add your nodes and the their properties into the nodes file as shown below. [root@master torque-2.4.14]# vi /var/spool/torque/server_priv/nodes node01 np=4 node02 np=4 node03 np=48. Initialize the serverdb and start the torque pbs server as shown below [root@master ~]# pbs_server -t create [root@master ~]# service pbs_server start Starting TORQUE Server: [ OK ]9. Create a queue(s) to suit your configuration and make at least one of default using the torque qmgr command. An easier way would be to create a file as below [root@master ~]# vi qmgr.cluster create queue default set queue default queue_type = Execution set queue default Priority = 60 set queue default max_running = 128 set queue default resources_max.walltime = 168:00:00 set queue default resources_default.walltime = 01:00:00 set queue default max_user_run = 12 set queue default enabled = True set queue default started = True set server scheduling = True set server managers = maui@master set server managers += root@master set server operators = maui@master set server operators += root@master set server default_queue = default 17
  18. 18. 10. Load the enter the file containing the qmgr configuration as illustrated below [root@master ~]# qmgr -c < qmgr.cluster11. A print of the pbs server configuration looks as below [root@master ~]# qmgr -c ’p s’ # # Create queues and set their attributes. # # # Create and define queue default # create queue default set queue default queue_type = Execution set queue default Priority = 60 set queue default max_running = 128 set queue default resources_max.walltime = 168:00:00 set queue default resources_default.walltime = 01:00:00 set queue default max_user_run = 12 set queue default enabled = True set queue default started = True # # Set server attributes. # set server scheduling = True set server acl_hosts = master.cluster set server managers = maui@master set server managers += root@master set server operators = maui@master set server operators += root@master set server default_queue = default set server log_events = 511 set server mail_from = adm set server query_other_jobs = True set server scheduler_iteration = 600 set server node_check_rate = 150 set server tcp_timeout = 6 set server next_job_number = 2612. Restart both the pbs server on the master node and the pbs mom on the nodes and execute, pbsnodes to see a print out on all free nodes. [root@master ~]# pbsnodes node01 state = free np = 2 ntype = cluster status = rectime=1308321567,varattr=,jobs=,state=free, netload=1205591,gres=,loadave=0.18,ncpus=4,physmem=4051184 kb,availmem=5021068kb,totmem=5103400kb,idletime=0,nusers=0, nsessions=? 0,sessions=? 0,uname=Linux node01 2.6.18-238. el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux 18
  19. 19. node02 state = free np = 2 ntype = cluster status = rectime=1308321569,varattr=,jobs=,state=free, netload=1209868,gres=,loadave=0.38,ncpus=4,physmem=4051184 kb,availmem=5020892kb,totmem=5103400kb,idletime=0,nusers=0, nsessions=? 0,sessions=? 0,uname=Linux node02 2.6.18-238. el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux node03 state = free np = 2 ntype = cluster status = rectime=1308321569,varattr=,jobs=,state=free, netload=1209868,gres=,loadave=0.38,ncpus=4,physmem=4051184 kb,availmem=5020892kb,totmem=5103400kb,idletime=0,nusers=0, nsessions=? 0,sessions=? 0,uname=Linux node02 2.6.18-238. el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux9 Maui configuration 1. Untar, configure, make binaries and install maui from source as shown in the next sequence of steps [root@master ~]# tar xvfz maui-3.3.1.tar.gz [root@master ~]# cd maui-3.3.1 [root@master maui-3.3.1]# ./configure --help [root@master maui-3.3.1]# ./configure --prefix=/opt/maui --with-spooldir=/var/spool/maui --with-pbs=/opt/torque/ [root@master maui-3.3.1]# make [root@master maui-3.3.1]# make install 2. Create a system user maui through which maui shall be run [root@master maui-3.3.1]# useradd -d /var/spool/maui -r -g daemon maui 3. Edit the maui.cfg file changing the SERVERHOST, ADMIN1, ADMIN3 and resouce manager definition(RMCFG) as shown in the snipett below [root@master maui-3.3.1]# vi /var/spool/maui/maui.cfg # maui.cfg 3.3.1 SERVERHOST master # primary admin must be first in list ADMIN1 maui root ADMIN3 ALL # Resource Manager Definition RMCFG[MASTER] TYPE=PBS 19
  20. 20. # Allocation Manager Definition AMCFG[bank] TYPE=NONE .... EOF4. Copy the init script in the maui source package to /etc/init.d/ and, edit the file changing the MAUI PREFIX to point to your installation directory. [root@master maui-3.3.1]# cp contrib/service-scripts/redhat. maui.d /etc/init.d/maui [root@master maui-3.3.1]# vi /etc/init.d/maui [root@master maui-3.3.1]# cat /etc/init.d/maui #!/bin/sh # # maui This script will start and stop the MAUI Scheduler # # chkconfig: 345 85 85 # description: maui # ulimit -n 32768 # Source the library functions . /etc/rc.d/init.d/functions MAUI_PREFIX=/opt/maui # let see how we were called case "$1" in start) echo -n "Starting MAUI Scheduler: " daemon --user maui $MAUI_PREFIX/sbin/maui echo ;; stop) echo -n "Shutting down MAUI Scheduler: " killproc maui echo ;; status) status maui ;; restart) $0 stop $0 start ;; *) echo "Usage: maui {start|stop|restart|status}" exit 1 esac5. Create a file maui.sh in the /etc/profile.d directory and to it add the environment variables PATH, INCLUDE and LD LIBRARY PATH and make it executable. 20
  21. 21. [root@master maui]# vi /etc/profile.d/maui.sh [root@master maui]# chmod +x /etc/profile.d/maui.sh10 Compiler InstallationA compilers is necessary in a cluster as they aid in the changing of sourcecode into executables that can be run or understood by the computer. Ofinterest are C, C++ and fortran compilers popular of which are the GCC andIntel compilers. Another, option is the PGI compilers which we shall not haveinstalled.10.1 GCC CompilersFrom the CentOS repositories we shall install the GCC compilers using the yumpackage management utility.[root@master src]# yum -y install gcc.x86_64 gcc-gfortran.x86_64 libstdc++.x86_64 libstdc++-devel.x86_64 libgcj.x86_64 compat-lib stdc++.x86_6410.2 Intel CompilersFor the Intel compilers which may give better results depending on the scenario,we shall proceed with the installation as outlined below: 1. Visit the Intel Website in your preferred web browser, register and down- load the Intel compilers for non-commercial use. 2. Move to the directory into which you downloaded the Intel C compilers and Fortran compilers. 3. Untar the tarballs and change directory into the created directory. [root@master ~]# tar xvfz l_ccompxe_2011.4.191.tgz [root@master ~]# cd l_ccompxe_2011.4.191 [root@master l_ccompxe_2011.4.191]# ./install.sh [root@master ~]# tar xvfz l_fcompxe_2011.4.191.tgz [root@master ~]# cd l_fcompxe_2011.4.191 [root@master l_fcompxe_2011.4.191]# ./install.sh 4. Execute the install.sh script and proceed as prompted.11 OpenMPI installationOpenMPI is an open source library implementation of the Message PassingInterface(MPI-2) and facilitates communication/message inter-change betweenprocess in a High Performance Computing environment. 21
  22. 22. 11.1 OpenMPI Compiled with GCC Compilers 1. Untar and compile the sources [root@master src]# tar xvfj openmpi-1.4.2.tar.bz2 [root@master src]# cd openmpi-1.4.2 [root@master openmpi-1.4.2]# mkdir build [root@master openmpi-1.4.2]# cd build/ [root@master build]# ../configure CC=gcc CXX=g++ FC=gfortran F77=gfortran --prefix=/opt/openmpi/1.4.2/gcc/4.1.2 --with-tm=/opt/torque/ 2. Create binaries by running ”make” [root@master build]# make 3. Finally, install the binaries into the system [root@master build]# make install11.2 OpenMPI Compiled with Intel Compilers 1. Untar and compile the sources as above. However, take keen notice of the value of the variables CC, CXX, FC and F77 as compared to the same step when compiled with the GCC compilers above. [root@master src]# tar xvfj openmpi-1.4.2.tar.bz2 [root@master src]# cd openmpi-1.4.2 [root@master openmpi-1.4.2]# mkdir build [root@master openmpi-1.4.2]# cd build/ [root@master build]# ../configure CC=icc CXX=icpc FC=ifort F77=ifort --prefix=/opt/openmpi/1.4.2/intel/12.0.4 --with-tm=/opt/torque/ 2. Create binaries by running ”make” [root@master build]# make 3. Finally, install the binaries into the system [root@master build]# make install12 Environment Modules installation 1. Obtain the environment modules source file, uncompress it and changee directory into the created directory as below [root@master src]# tar xvfz modules-3.2.8a.tar.gz [root@master src]# cd modules-3.2.8 2. Then compile the sources specifying a prefix where the sources should be installed. 22
  23. 23. [root@master modules-3.2.8]# ./configure --prefix=/opt Should, you be running a 64-bit system and encounter an error indicating tcl lib and include directories cannot be found, proceed as below [root@master modules-3.2.8]# ./configure --with-tcl-lib=/usr/lib64/ --with-tcl-inc=/usr/include/ --prefix=/opt 3. Then create binaries and install. [root@master modules-3.2.8]# make [root@master modules-3.2.8]# make install 4. Finally, copy the init scrips to the /etc/profile.d directory to make the modules command available system-wide. [root@master modules-3.2.8]# cp /opt/Modules/3.2.8/init/bash /etc/ profile.d/modules.sh [root@master modules-3.2.8]# cp /opt/Modules/3.2.8/init/bash_compl etion /etc/profile.d/modules_bash_completion.sh13 C3 Tools installation 1. Uncompress the C3 tools source package and execute the install script [root@master src]# tar xvfz c3-4.0.1.tar.gz [root@master src]# cd c3-4.0.1 [root@master c3-4.0.1]# ./Install-c3 2. Create a c3.conf configuration file defining a cluster name, the master node and nodes in the cluster. [root@master c3-4.0.1]# vi /etc/c3.conf [root@master c3-4.0.1]# cat /etc/c3.conf cluster cluster1 { master:master node0[1-3] } 3. Create ssh keys to be used for passwordless login in the nodes of the cluster. [root@master ~]# ssh-keygen -t dsa Generating public/private dsa key pair. Enter file in which to save the key (/root/.ssh/id_dsa): Created directory ’/root/.ssh’. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_dsa. Your public key has been saved in /root/.ssh/id_dsa.pub. The key fingerprint is: 46:6d:e5:e5:e2:5c:b5:72:16:bc:04:6f:59:2c:b5:32 root@master .cluster 23
  24. 24. 4. Copy the /.ssh/id dsa.pub contents to the authorized keys file of all nodes in the cluster. This is how to do it on a single node. [root@master ~]# ssh-copy-id -i ~/.ssh/id_dsa.pub root@node01 21 The authenticity of host ’node01 (192.168.10.2)’ can’t be es tablished. DSA key fingerprint is fe:8d:bf:6e:de:f4:94:d3:c4: d7:ee:74:6c:8c:dd:da. Are you sure you want to continue conn- ecting (yes/no)? yes Warning: Permanently added ’node01,192.168.10.2’ (RSA) to the list of known hosts. root@node01’s password: Now try logging into the machine, with "ssh ’root@node01’", and check in: .ssh/authorized_keys to make sure we haven’t added extra keys that you weren’t expecting. 5. Test if the key was succesfully registered by attempting to login into node01. [root@master ~]# ssh node01 Last login: Fri Jun 17 12:53:28 2011 [root@node01 ~]# exit logout14 Password SyncingUser accounts and passwords in the cluster should be similar in all nodes form-ing the cluster should be the same however, we cant have the user create thepassword in all the machines that form up the cluster. We shall therefore createa script to effect this. In our case we shall use the cpush command from the c3tools package installed earlier. Listing 3: node-ks.cfg#! / b i n / b a s h## Sync / e t c / passwd , / e t c / shadow and / e t c / group# File : / root / bin# Cron : min hour dom month dow r o o t / e t c / password−push . shf o r f i n passwd shadow group ; do / opt / c3 −4/cpush / e t c / ” $ { f } ” > / dev / n u l ldone However, have in mind that rsync could be used to achieve the same.15 NetCDF, HDF5 and GrADs installationGrads requires NetCDF and HDF5 as dependencies for its installtion. Therefore,we shall install them all as a pack from the epel repositories. 24
  25. 25. [root@master ~]# yum -y install netcdf hdf5 grads16 NCL and NCO installationThese too we shall have installed using the yum package manager as below[root@master ~]# yum -y install ncl nco17 R Statistical package installationThe R statistical package will be installed from the epel repositories to save asfrom the agony of installing a myraid of dependencies and for easy updating ofthe packages.[root@master ~]# yum -y install R.x86_64 R-core.x86_64 R-devel.x86_64 libRmath.x86_64 libRmath-devel.x86_64 25
  26. 26. Part IIIComputing Node Installation 26
  27. 27. 18 Node OS installtionWith the master node setup complete, installtion of the nodes should just be apush of a button. However, a little understanding of the node-ks.cfg is essential.It marks the packages tftp, openssh-server, openssh, xorg-x11-xauth, mc andstrace for installation and those with a preceeding − sign for uninstalltion. There after, the post installation section is executed, which removes unwantedpackages, creates a local repository, and install the gcc compilers on the nodeswhich are available on the CentOS repositories. Listing 4: node-ks.cfgtftpopenssh−s e r v e ropensshxorg−x11−xauthmcstrace−cups−cups−l i b s−b l u e z − u t i l s−b l u e z −gnome−rp−pppoe−ppp%p o s t −−l o g =/ r o o t / ks−p o s t . l o gMASTER= 1 9 2 . 1 6 8 . 1 0 . 1# D e l e t e unwanted s e r v i c e sf o r i in sendmail ;do c h k c o n f i g −−d e l ” $ { i } ”done# Remove d e f a u l t r e p o st a r c v f z yum . r e p o s . d . t a r . gz / e t c /yum . r e p o s . drm − r f / e t c /yum . r e p o s . d/∗# Mount / d i s t r o form master nodemkdir −p / d i s t r omount −t n f s $MASTER: / d i s t r o / d i s t r o# Add mount t o f s t a becho −e ” 1 9 2 . 1 6 8 . 1 0 . 1 : / d i s t r o t / d i s t r o t t n f s t d e f a u l t s t 0 0 ” | t e e −a / e t c / f s t a b# Add master node ’ s / o p t t o f s t a becho −e ” 1 9 2 . 1 6 8 . 1 0 . 1 : / opt t / opt t t n f s t d e f a u l t s t 0 0 ” | t e e −a / etc / fstab# Add master node ’ s /home t o f s t a becho −e ” 1 9 2 . 1 6 8 . 1 0 . 1 : / home t /home t t n f s t d e f a u l t s t 0 0 ” | t e e −a / etc / fstab# E x e c u t e t h e n o d e i n s t a l l . sh s c r i p t t o i n s t a l l pbs mom/ opt / t o r q u e / n o d e i n s t a l l . sh# Create l o c a l repomkdir −p / d i s t r o / c e n t o secho −e ” [ L o c a l ] nname=CentOS−$ r e l e a s e v e r − L o c a l n b a s e u r l= f i l 27
  28. 28. e : / / / d i s t r o / c e n t o s ngpgcheck=0 n e n a b l e d=1” | t e e / e t c /yum . r e p o s. d/CentOS−L o c a l . r e p oyum c l e a n a l lyum makecache# GCC c o m p i l e r syum −y i n s t a l l g c c . x 8 6 6 4 gcc−g f o r t r a n . x 8 6 6 4 l i b s t d c ++. x 8 6 6 4 −d l i b s t d c++ e v e l . x 8 6 6 4 l i b g c j . x 8 6 6 4 compat−l i b s t d c ++. x 8 6 6 4 Once the installation is complete, you could have a look at the ks-post.login root’s home directory for any errors while executing the post section of thekickstart file.19 Name resolutionFinally, ensure that all the nodes in the cluster can resolve names of the nodes inthe cluster. You can either setup DNS on the master node or use the /etc/hostsfile. SHould you need help setting up a DNS server, post your requests in thecomments below. 28

×