Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Job Management Systems SGE v1.3 Author: Anand Vaidya [email_address]
Why use SGE? <ul><li>Maintain order in a shared resource – like queing up at a movie ticket counter rather than mobbing th...
What is SGE? <ul><li>SGE is a distributed resource management software </li></ul><ul><li>Provides users the means to submi...
How does SGE work? <ul><li>Users submit jobs to the Grid Engine. </li></ul><ul><li>Unless resources are immediately availa...
SGE Components <ul><li>Hosts </li></ul><ul><ul><li>Master (coordinate activities, hold queues) </li></ul></ul><ul><ul><li>...
SGE Commands - qhost <ul><li>What is the state of the cluster? How many nodes, type, load? What is my chance of getting a ...
SGE Commands - qsub <ul><li>Create a jobscripts (myjob.sh) </li></ul><ul><li>Submit for execution </li></ul><ul><li>$ qsub...
(C) Anand Vaidya anand@novaglobal.com.sg SGE Commands - qstat <ul><li>check status of your job: </li></ul><ul><ul><li>qsta...
SGE Commands - qstat <ul><li>Status of the job is indicated by letters as: </li></ul><ul><li>qw - waiting t  - transfering...
SGE Commands - qdel <ul><li>Delete your job, if you wish </li></ul><ul><li>qdel 743 </li></ul><ul><li>vaidya has deleted j...
SGE Commands - qmon <ul><li>qmon is a  XWindows GUI tool to submit/delete/view jobs, configure SGE system </li></ul><ul><l...
SGE Commands – qsh, qtcsh <ul><li>Submit  a Interactive session request: </li></ul><ul><li>qlogin </li></ul><ul><li>qrsh  ...
SGE Commands – jobscript <ul><li>sample job script: </li></ul><ul><li>#!/bin/bash  </li></ul><ul><li># </li></ul><ul><li>#...
SGE Commands – jobscript <ul><li>sample job script: </li></ul><ul><li>#!/bin/bash  </li></ul><ul><li>#  </li></ul><ul><li>...
SGE Commands – jobscript <ul><li>-cwd = change to current dir before running job </li></ul><ul><li>-j y = merge error with...
Admin Commands <ul><li>Next few slides show commands useful for SGE admins (not users/researchers) </li></ul>
SGE Commands – qconf <ul><li>Show: </li></ul><ul><ul><li>complexes: qconf -sc </li></ul></ul><ul><ul><li>queues: qconf -sq...
SGE Commands – qping <ul><li>[anand@shark-c02 ~]$ qping -info shark-c01 537 execd 1 </li></ul><ul><li>05/24/2006 21:57:34:...
LSF Commands <ul><li>bsub – submit a job </li></ul><ul><li>bstop – suspend a job </li></ul><ul><li>bresume – resume a susp...
LSF Commands <ul><li>lsmon – monitor load, resource availability... </li></ul><ul><li>lsid – show lsf details (version etc...
Acknowledgements & Copying <ul><li>This material is based on my experience as well as material collected from SGE document...
The End  <ul><li>Thanks for your time. If you have any feedback, corrections or questions please contact me: Anand Vaidya,...
Upcoming SlideShare
Loading in …5
×

Linux Cluster Job Management Systems (SGE)

16,963 views

Published on

These slides provide an introduction to Sun Grid Engine, used on Linux HPC clusters quite often

Published in: Economy & Finance, Career
  • Be the first to comment

Linux Cluster Job Management Systems (SGE)

  1. 1. Job Management Systems SGE v1.3 Author: Anand Vaidya [email_address]
  2. 2. Why use SGE? <ul><li>Maintain order in a shared resource – like queing up at a movie ticket counter rather than mobbing the counter </li></ul><ul><li>Apply different usage policies – PhDs and Profs get better treatment than first year grads </li></ul><ul><li>Everyone gets a fair share of the computing resource. </li></ul>
  3. 3. What is SGE? <ul><li>SGE is a distributed resource management software </li></ul><ul><li>Provides users the means to submit computationally demanding tasks to the SGE system for transparent distribution of the associated workload. </li></ul>
  4. 4. How does SGE work? <ul><li>Users submit jobs to the Grid Engine. </li></ul><ul><li>Unless resources are immediately available non-interactive jobs are kept in queues until resources to execute them become available. </li></ul><ul><li>Jobs are passed onto the available execution hosts </li></ul><ul><li>Records of each jobs progress through the system are kept and reported when requested. </li></ul>
  5. 5. SGE Components <ul><li>Hosts </li></ul><ul><ul><li>Master (coordinate activities, hold queues) </li></ul></ul><ul><ul><li>Execution (workers) </li></ul></ul><ul><ul><li>Administration (sets up system, queues etc) </li></ul></ul><ul><ul><li>Submit (users can submit jobs from these) </li></ul></ul><ul><li>Usually the master and admin host are the same machines </li></ul><ul><li>Queues (defined by the administrator) </li></ul><ul><li>User and Administrator Commands </li></ul><ul><li>Daemons: sge_qmaster (Master Daemon), sge_schedd (Scheduler Daemon), sge_execd (Execution Daemon) and sge_commd (Communication Daemon) </li></ul>
  6. 6. SGE Commands - qhost <ul><li>What is the state of the cluster? How many nodes, type, load? What is my chance of getting a node? </li></ul><ul><li>[root@shark ~]# qhost </li></ul><ul><li>HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS </li></ul><ul><li>------------------------------------------------------------------------------- </li></ul><ul><li>global - - - - - - - </li></ul><ul><li>shark-c00 lx24-amd64 2 2.02 3.9G 240.8M 4.0G 0.0 </li></ul><ul><li>shark-c02 lx24-amd64 2 2.00 3.9G 214.9M 4.0G 0.0 </li></ul><ul><li>shark-c03 lx24-amd64 2 1.76 3.9G 215.9M 4.0G 0.0 </li></ul>
  7. 7. SGE Commands - qsub <ul><li>Create a jobscripts (myjob.sh) </li></ul><ul><li>Submit for execution </li></ul><ul><li>$ qsub myjob.sh </li></ul><ul><li>Your job 742 (&quot;myjob.sh&quot;) has been submitted. </li></ul><ul><li>Simplest Job: </li></ul><ul><li>[vaidya@shark ~]$ cat myjob.sh </li></ul><ul><li>#!/bin/sh </li></ul><ul><li>sleep 10 </li></ul><ul><li>date > /tmp/test1.out.txt </li></ul><ul><li>Variations: qsub -cwd myjob.sh </li></ul>
  8. 8. (C) Anand Vaidya anand@novaglobal.com.sg SGE Commands - qstat <ul><li>check status of your job: </li></ul><ul><ul><li>qstat ; qstat -f ; </li></ul></ul><ul><ul><li>qstat -u username ; qstat -j job_id </li></ul></ul><ul><li>[root@shark ~]# qstat </li></ul><ul><li>job-ID prior name user state submit/start at queue slots ja-task-ID </li></ul><ul><li>----------------------------------------------------------------------------------------------------------------- </li></ul><ul><li>639 0.55500 HCPDIV7 test1 r 05/17/2006 10:16:31 all.q@shark-c00 1 </li></ul><ul><li>658 0.55500 HCPDIV1 test1 r 05/17/2006 13:37:35 all.q@shark-c00 1 </li></ul><ul><li>694 0.55500 FCCDVI test1 r 05/17/2006 23:52:19 all.q@shark-c02 1 </li></ul><ul><li>695 0.55500 FCCDVI1 test1 r 05/17/2006 23:52:19 all.q@shark-c02 1 </li></ul>
  9. 9. SGE Commands - qstat <ul><li>Status of the job is indicated by letters as: </li></ul><ul><li>qw - waiting t - transfering </li></ul><ul><li>r - running s,S - suspended </li></ul><ul><li>R - restarted T - threshold </li></ul>
  10. 10. SGE Commands - qdel <ul><li>Delete your job, if you wish </li></ul><ul><li>qdel 743 </li></ul><ul><li>vaidya has deleted job 743 </li></ul>
  11. 11. SGE Commands - qmon <ul><li>qmon is a XWindows GUI tool to submit/delete/view jobs, configure SGE system </li></ul><ul><li>Example: Submit a job using qmon </li></ul><ul><ul><li>Click the Job Submission icon. </li></ul></ul><ul><ul><li>Click the Job Script file selection icon to open a file selection box and select your script file. Then, click OK. </li></ul></ul><ul><ul><li>Click the Submit button at the bottom of the Job Submission dialog. </li></ul></ul><ul><ul><li>After a couple of seconds, you should be able to monitor your job in the Job Control dialog. Click the Job Control icon in the QMON control panel. </li></ul></ul><ul><ul><li>You first see it under Pending Jobs, and it quickly moves to Running Jobs after it gets started. </li></ul></ul>
  12. 12. SGE Commands – qsh, qtcsh <ul><li>Submit a Interactive session request: </li></ul><ul><li>qlogin </li></ul><ul><li>qrsh </li></ul><ul><li>Ensure you have a valid XServer running on your desktop. Allow remote xclients to display on your desktop. </li></ul><ul><li>Submit an Interactive session request: </li></ul><ul><li>qsh </li></ul><ul><li>qtcsh </li></ul><ul><li>Note: using this feature needs additional configuration, may not work otherwise. </li></ul>
  13. 13. SGE Commands – jobscript <ul><li>sample job script: </li></ul><ul><li>#!/bin/bash </li></ul><ul><li># </li></ul><ul><li>#$ -cwd </li></ul><ul><li>#$ -j y </li></ul><ul><li>#$ -S /bin/bash </li></ul><ul><li>#$ -V </li></ul><ul><li>date </li></ul><ul><li>sleep 10 </li></ul><ul><li>env </li></ul><ul><li>date </li></ul>
  14. 14. SGE Commands – jobscript <ul><li>sample job script: </li></ul><ul><li>#!/bin/bash </li></ul><ul><li># </li></ul><ul><li>#$ -cwd </li></ul><ul><li>#$ -j y </li></ul><ul><li>#$ -S /bin/bash </li></ul><ul><li># </li></ul><ul><li>$MPI_DIR/mpirun -np $NSLOTS -machinefile $TMPDIR/machines myparallelprog.exe {infile.txt outfile.txt} </li></ul>
  15. 15. SGE Commands – jobscript <ul><li>-cwd = change to current dir before running job </li></ul><ul><li>-j y = merge error with stdout </li></ul><ul><li>-r y = code is re-runnable </li></ul><ul><li>-N jname = set the job name </li></ul><ul><li>-l h_rt = 00:30:00 run job for max of 30mins </li></ul><ul><li>-pe mpich – Invoke parallel environment </li></ul><ul><li>-pe mpich-ib – use infiniband parallel environment </li></ul><ul><li>-pe mpich-eth – use ethernet parallel env </li></ul><ul><li>-V = carry all env variable settings </li></ul>
  16. 16. Admin Commands <ul><li>Next few slides show commands useful for SGE admins (not users/researchers) </li></ul>
  17. 17. SGE Commands – qconf <ul><li>Show: </li></ul><ul><ul><li>complexes: qconf -sc </li></ul></ul><ul><ul><li>queues: qconf -sql </li></ul></ul><ul><ul><li>PE: qconf -spl </li></ul></ul><ul><ul><li>exec host: qconf -sel qconf -se c35 </li></ul></ul><ul><ul><li>submit hosts: qconf -ss </li></ul></ul><ul><ul><li>admin hosts: qconf -sh </li></ul></ul><ul><ul><li>list calendars qconf -scall </li></ul></ul><ul><ul><li>configuration qconf -sconf </li></ul></ul><ul><ul><li>user list: qconf -suserl </li></ul></ul><ul><ul><li>Scheduler conf: qconf -ssconf </li></ul></ul>
  18. 18. SGE Commands – qping <ul><li>[anand@shark-c02 ~]$ qping -info shark-c01 537 execd 1 </li></ul><ul><li>05/24/2006 21:57:34: </li></ul><ul><li>SIRM version: 0.1 </li></ul><ul><li>SIRM message id: 1 </li></ul><ul><li>start time: 05/24/2006 21:31:37 (1148477497) </li></ul><ul><li>run time [s]: 1768 </li></ul><ul><li>messages in read buffer: 0 </li></ul><ul><li>messages in write buffer: 0 </li></ul><ul><li>nr. of connected clients: 2 </li></ul><ul><li>status: 0 </li></ul><ul><li>info: dispatcher: R (0.04) | OK </li></ul><ul><li>Monitor: disabled </li></ul>
  19. 19. LSF Commands <ul><li>bsub – submit a job </li></ul><ul><li>bstop – suspend a job </li></ul><ul><li>bresume – resume a suspended task </li></ul><ul><li>btop – move job to top </li></ul><ul><li>bswitch – move jobs between queues </li></ul><ul><li>lsgrun – run a task on a set of hosts </li></ul><ul><li>bkill – kill a job </li></ul>
  20. 20. LSF Commands <ul><li>lsmon – monitor load, resource availability... </li></ul><ul><li>lsid – show lsf details (version etc) </li></ul><ul><li>lshosts – show hosts & static info </li></ul><ul><li>lsload – show load info for hosts </li></ul><ul><li>lsinfo – show lsf config info </li></ul><ul><li>busers – show user info </li></ul><ul><li>bacct – show acct info on finished jobs </li></ul><ul><li>bjobs – show info on jobs </li></ul><ul><li>bpeek – show stdin/stdout of unfinished jobs </li></ul>
  21. 21. Acknowledgements & Copying <ul><li>This material is based on my experience as well as material collected from SGE documentation. </li></ul><ul><li>This presentation can be redistributed as follows: </li></ul><ul><ul><ul><li>No commercial re-distribution: eg, as part of a for-profit CDROM or as part of your sales pitch. Seek my permission first. </li></ul></ul></ul><ul><ul><ul><li>Must attribute the document creator. </li></ul></ul></ul><ul><ul><ul><li>Share alike: If you use this document and enhance it or modify, share the modifications or the modified document </li></ul></ul></ul><ul><ul><ul><li>Which means I apply: Creative Commons License, http://creativecommons.org/licenses/by-nc-sa/2.5/ </li></ul></ul></ul>
  22. 22. The End <ul><li>Thanks for your time. If you have any feedback, corrections or questions please contact me: Anand Vaidya, anand@novaglobal.com.sg </li></ul><ul><li>This document was created with OpenOffice on Linux. email me if you want the odp file instead of the pdf </li></ul>

×