Linux Cluster Job Management Systems (SGE)

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Linux Cluster Job Management Systems (SGE) - Presentation Transcript

    1. Job Management Systems SGE v1.3 Author: Anand Vaidya [email_address]
    2. Why use SGE?
      • Maintain order in a shared resource – like queing up at a movie ticket counter rather than mobbing the counter
      • Apply different usage policies – PhDs and Profs get better treatment than first year grads
      • Everyone gets a fair share of the computing resource.
    3. What is SGE?
      • SGE is a distributed resource management software
      • Provides users the means to submit computationally demanding tasks to the SGE system for transparent distribution of the associated workload.
    4. How does SGE work?
      • Users submit jobs to the Grid Engine.
      • Unless resources are immediately available non-interactive jobs are kept in queues until resources to execute them become available.
      • Jobs are passed onto the available execution hosts
      • Records of each jobs progress through the system are kept and reported when requested.
    5. SGE Components
      • Hosts
        • Master (coordinate activities, hold queues)
        • Execution (workers)
        • Administration (sets up system, queues etc)
        • Submit (users can submit jobs from these)
      • Usually the master and admin host are the same machines
      • Queues (defined by the administrator)
      • User and Administrator Commands
      • Daemons: sge_qmaster (Master Daemon), sge_schedd (Scheduler Daemon), sge_execd (Execution Daemon) and sge_commd (Communication Daemon)
    6. SGE Commands - qhost
      • What is the state of the cluster? How many nodes, type, load? What is my chance of getting a node?
      • [root@shark ~]# qhost
      • HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
      • -------------------------------------------------------------------------------
      • global - - - - - - -
      • shark-c00 lx24-amd64 2 2.02 3.9G 240.8M 4.0G 0.0
      • shark-c02 lx24-amd64 2 2.00 3.9G 214.9M 4.0G 0.0
      • shark-c03 lx24-amd64 2 1.76 3.9G 215.9M 4.0G 0.0
    7. SGE Commands - qsub
      • Create a jobscripts (myjob.sh)
      • Submit for execution
      • $ qsub myjob.sh
      • Your job 742 ("myjob.sh") has been submitted.
      • Simplest Job:
      • [vaidya@shark ~]$ cat myjob.sh
      • #!/bin/sh
      • sleep 10
      • date > /tmp/test1.out.txt
      • Variations: qsub -cwd myjob.sh
    8. (C) Anand Vaidya anand@novaglobal.com.sg SGE Commands - qstat
      • check status of your job:
        • qstat ; qstat -f ;
        • qstat -u username ; qstat -j job_id
      • [root@shark ~]# qstat
      • job-ID prior name user state submit/start at queue slots ja-task-ID
      • -----------------------------------------------------------------------------------------------------------------
      • 639 0.55500 HCPDIV7 test1 r 05/17/2006 10:16:31 all.q@shark-c00 1
      • 658 0.55500 HCPDIV1 test1 r 05/17/2006 13:37:35 all.q@shark-c00 1
      • 694 0.55500 FCCDVI test1 r 05/17/2006 23:52:19 all.q@shark-c02 1
      • 695 0.55500 FCCDVI1 test1 r 05/17/2006 23:52:19 all.q@shark-c02 1
    9. SGE Commands - qstat
      • Status of the job is indicated by letters as:
      • qw - waiting t - transfering
      • r - running s,S - suspended
      • R - restarted T - threshold
    10. SGE Commands - qdel
      • Delete your job, if you wish
      • qdel 743
      • vaidya has deleted job 743
    11. SGE Commands - qmon
      • qmon is a XWindows GUI tool to submit/delete/view jobs, configure SGE system
      • Example: Submit a job using qmon
        • Click the Job Submission icon.
        • Click the Job Script file selection icon to open a file selection box and select your script file. Then, click OK.
        • Click the Submit button at the bottom of the Job Submission dialog.
        • After a couple of seconds, you should be able to monitor your job in the Job Control dialog. Click the Job Control icon in the QMON control panel.
        • You first see it under Pending Jobs, and it quickly moves to Running Jobs after it gets started.
    12. SGE Commands – qsh, qtcsh
      • Submit a Interactive session request:
      • qlogin
      • qrsh
      • Ensure you have a valid XServer running on your desktop. Allow remote xclients to display on your desktop.
      • Submit an Interactive session request:
      • qsh
      • qtcsh
      • Note: using this feature needs additional configuration, may not work otherwise.
    13. SGE Commands – jobscript
      • sample job script:
      • #!/bin/bash
      • #
      • #$ -cwd
      • #$ -j y
      • #$ -S /bin/bash
      • #$ -V
      • date
      • sleep 10
      • env
      • date
    14. SGE Commands – jobscript
      • sample job script:
      • #!/bin/bash
      • #
      • #$ -cwd
      • #$ -j y
      • #$ -S /bin/bash
      • #
      • $MPI_DIR/mpirun -np $NSLOTS -machinefile $TMPDIR/machines myparallelprog.exe {infile.txt outfile.txt}
    15. SGE Commands – jobscript
      • -cwd = change to current dir before running job
      • -j y = merge error with stdout
      • -r y = code is re-runnable
      • -N jname = set the job name
      • -l h_rt = 00:30:00 run job for max of 30mins
      • -pe mpich – Invoke parallel environment
      • -pe mpich-ib – use infiniband parallel environment
      • -pe mpich-eth – use ethernet parallel env
      • -V = carry all env variable settings
    16. Admin Commands
      • Next few slides show commands useful for SGE admins (not users/researchers)
    17. SGE Commands – qconf
      • Show:
        • complexes: qconf -sc
        • queues: qconf -sql
        • PE: qconf -spl
        • exec host: qconf -sel qconf -se c35
        • submit hosts: qconf -ss
        • admin hosts: qconf -sh
        • list calendars qconf -scall
        • configuration qconf -sconf
        • user list: qconf -suserl
        • Scheduler conf: qconf -ssconf
    18. SGE Commands – qping
      • [anand@shark-c02 ~]$ qping -info shark-c01 537 execd 1
      • 05/24/2006 21:57:34:
      • SIRM version: 0.1
      • SIRM message id: 1
      • start time: 05/24/2006 21:31:37 (1148477497)
      • run time [s]: 1768
      • messages in read buffer: 0
      • messages in write buffer: 0
      • nr. of connected clients: 2
      • status: 0
      • info: dispatcher: R (0.04) | OK
      • Monitor: disabled
    19. LSF Commands
      • bsub – submit a job
      • bstop – suspend a job
      • bresume – resume a suspended task
      • btop – move job to top
      • bswitch – move jobs between queues
      • lsgrun – run a task on a set of hosts
      • bkill – kill a job
    20. LSF Commands
      • lsmon – monitor load, resource availability...
      • lsid – show lsf details (version etc)
      • lshosts – show hosts & static info
      • lsload – show load info for hosts
      • lsinfo – show lsf config info
      • busers – show user info
      • bacct – show acct info on finished jobs
      • bjobs – show info on jobs
      • bpeek – show stdin/stdout of unfinished jobs
    21. Acknowledgements & Copying
      • This material is based on my experience as well as material collected from SGE documentation.
      • This presentation can be redistributed as follows:
          • No commercial re-distribution: eg, as part of a for-profit CDROM or as part of your sales pitch. Seek my permission first.
          • Must attribute the document creator.
          • Share alike: If you use this document and enhance it or modify, share the modifications or the modified document
          • Which means I apply: Creative Commons License, http://creativecommons.org/licenses/by-nc-sa/2.5/
    22. The End
      • Thanks for your time. If you have any feedback, corrections or questions please contact me: Anand Vaidya, anand@novaglobal.com.sg
      • This document was created with OpenOffice on Linux. email me if you want the odp file instead of the pdf

    + anandvaidyaanandvaidya, 3 years ago

    custom

    4294 views, 0 favs, 1 embeds more stats

    These slides provide an introduction to Sun Grid En more

    More info about this document

    CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

    Go to text version

    • Total Views 4294
      • 4248 on SlideShare
      • 46 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 81
    Most viewed embeds
    • 46 views on http://sistemasdistribuidos4xiezar.blogspot.com

    more

    All embeds
    • 46 views on http://sistemasdistribuidos4xiezar.blogspot.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories