Your SlideShare is downloading. ×
  • Like
Scheduling torque-maui-tutorial
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Scheduling torque-maui-tutorial

  • 4,162 views
Published

 

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
4,162
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
45
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Scheduling with Torque-Maui – A Tutorial
  • 2. Contents
    The problem being addressed
    Torque – how it helps
    Maui – how it helps
    Job Submission – job priorities, job dependencies, job queues
    Job Monitoring
    Job Accounting
    Install
  • 3. The problem
    Have jobs/tasks run as soon as possible
    Have higher priority jobs run earlier than others
    Run jobs on any free machine across a cluster automatically not just on one machine
    Have jobs run un-attended and inform in case of error
    Machine utilization has to be high
    Monitor and account for all the usage
  • 4. Torque – how it helps
    What is TORQUE’s job as the resource manager.
    Accepting and starting jobs/tasks across a batch farm (qsub command)
    Cancelling jobs (qdel command)
    Monitoring the state of jobs (qstatcommand)
    Collecting return codes (qstat)
    Accounting of jobs, the time they took, memory used, etc (tracejob command)
  • 5. Maui – how it helps
    What is MAUI’s Job?
    MAUI makes all the decisions.
    Should a job be started asking questions like:
    Is there enough resource to start the job?
    Given all the jobs I could start which one should I start?
    MAUI runs a scheduling iteration:
    When a job is submitted.
    When a job ends.
    At regular configurable intervals.
  • 6. Job Submission
    Jobs are submitted to the batch system by means of the qsub command, as in
    qsub job.sh
    But you can also add resource description directly on the command line:
    qsub -l nodes=1:ppn=4 job.sh:mem=200mb:walltime=120 job.sh
    qsub Returns a <jobid>
  • 7. Job priority
    Can give priority with qsub
    qsub –p 20 job.sh
    Default priority is 0
    U can give priorities from 0 to 1023 for a job
  • 8. Job dependencies
    Run a job after another job successfully ends
    echo “vflush” | qsub -W depend=afterok:10.penguin7.orchesys.com -p 10 -q flush_queue
    Here ‘10.penguin7.orchesys.com’ is jobid of another job which has to complete successfully only then the current job is launched.
  • 9. Job Queues
    Batch systems are usually configured with multiple queues.
    Each queue can be configured to accept job from a certain group of users, or within specified resource limits
    Queue selection is performed with -q queuename on the qsubcommand line
    Glassbeam has default queue (batch) and flush_queue (where only one job can run at a time)
  • 10. Job Monitoring
    For a job id, u can see the command that was fired for the job in the file
    /var/spool/torque/server_priv/jobs/<JOBID.SC>
    sudo cat 90.localhost.localdomain.SC
    /home/gbprod/testscript_aruba/aruba_parallel_loader qa0 1306219430 aruba_test_pod /glassbeam/core/bin
    qstat – status of all submitted jobs
    Status of only one job - qstat <jobid>
    Only running jobs - qstat –r
    Email alert for jobs - qsub -m ae -M santosh@glassbeam.com (Send email in case of a – abort, e – end of job)
  • 11. Job accounting …
    Can give job return status, how much time and
    show what happened today to job id
    Tracejob <jobid>
    tracejob -n d <jobid> (search last d days for the job),
    fast version of tracejob: tracejob -f error -f system -f admin -f security -f sched -f debug -f debug2 -f job -f job_usage 114.localhost
  • 12. Job accounting
    Tracejob output
    Job: 114.localhost.localdomain
    05/30/2011 05:25:15 A queue=batch
    05/30/2011 05:25:15 A user=gbprod group=glassbeamjobname=STDIN queue=batch
    ctime=1306747515 qtime=1306747515 etime=1306747515
    start=1306747515 owner=gbprod@localhost.localdomain
    exec_host=localhost/0 Resource_List.neednodes=1
    Resource_List.nodect=1 Resource_List.nodes=1
    05/30/2011 05:25:25 A user=gbprod group=glassbeamjobname=STDIN queue=batch
    ctime=1306747515 qtime=1306747515 etime=1306747515
    start=1306747515 owner=gbprod@localhost.localdomain
    exec_host=localhost/0 Resource_List.neednodes=1
    Resource_List.nodect=1 Resource_List.nodes=1
    session=26992 end=1306747525 Exit_status=0
    resources_used.cput=00:00:00 resources_used.mem=0kb
    resources_used.vmem=0kb
    resources_used.walltime=00:00:10
  • 13. Install
    Torque install
    As root user
    Go to folder install/torque-gb-3.0.1
    Run command:
    ./torque.setupgbprodlocalhost
    Maui install
    As root user
    Go to folder install/maui-gb-3.3.1
    Run command
    shinstall.sh