Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Scheduling with Torque-Maui – A Tutorial<br />
Contents<br />The problem being addressed<br />Torque – how it helps<br />Maui – how it helps<br />Job Submission – job pr...
The problem<br />Have jobs/tasks run as soon as possible<br />Have higher priority jobs run earlier than others<br />Run j...
Torque – how it helps<br />What is TORQUE’s job as the resource manager.<br />Accepting and starting jobs/tasks across a b...
Maui – how it helps<br />What is MAUI’s Job?<br />MAUI makes all the decisions.<br />Should a job be started asking questi...
Job Submission<br />Jobs are submitted to the batch system by means of the qsub command, as in<br />qsub job.sh<br />But y...
Job priority<br />Can give priority with qsub<br />qsub –p 20 job.sh<br />Default priority is 0<br />U can give priorities...
Job dependencies<br />Run a job after another job successfully ends<br />echo “vflush” | qsub -W depend=afterok:10.penguin...
Job Queues<br />Batch systems are usually configured with multiple queues.<br />Each queue can be configured to accept job...
Job Monitoring<br />For a job id, u can see the command that was fired for the job in the file<br />/var/spool/torque/serv...
Job accounting …<br />Can give job return status, how much time and <br />show what happened today to job id<br />Tracejob...
Job accounting<br />Tracejob output<br />Job: 114.localhost.localdomain<br />05/30/2011 05:25:15  A    queue=batch<br />05...
Install<br />Torque install<br />As root user<br />Go to folder install/torque-gb-3.0.1<br />Run command:<br />./torque.se...
Upcoming SlideShare
Loading in …5
×

Scheduling torque-maui-tutorial

5,554 views

Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Scheduling torque-maui-tutorial

  1. 1. Scheduling with Torque-Maui – A Tutorial<br />
  2. 2. Contents<br />The problem being addressed<br />Torque – how it helps<br />Maui – how it helps<br />Job Submission – job priorities, job dependencies, job queues<br />Job Monitoring<br />Job Accounting<br />Install<br />
  3. 3. The problem<br />Have jobs/tasks run as soon as possible<br />Have higher priority jobs run earlier than others<br />Run jobs on any free machine across a cluster automatically not just on one machine<br />Have jobs run un-attended and inform in case of error<br />Machine utilization has to be high<br />Monitor and account for all the usage<br />
  4. 4. Torque – how it helps<br />What is TORQUE’s job as the resource manager.<br />Accepting and starting jobs/tasks across a batch farm (qsub command)<br />Cancelling jobs (qdel command)<br />Monitoring the state of jobs (qstatcommand)<br />Collecting return codes (qstat)<br />Accounting of jobs, the time they took, memory used, etc (tracejob command)<br />
  5. 5. Maui – how it helps<br />What is MAUI’s Job?<br />MAUI makes all the decisions.<br />Should a job be started asking questions like:<br />Is there enough resource to start the job?<br />Given all the jobs I could start which one should I start?<br />MAUI runs a scheduling iteration:<br />When a job is submitted.<br />When a job ends.<br />At regular configurable intervals.<br />
  6. 6. Job Submission<br />Jobs are submitted to the batch system by means of the qsub command, as in<br />qsub job.sh<br />But you can also add resource description directly on the command line:<br />qsub -l nodes=1:ppn=4 job.sh:mem=200mb:walltime=120 job.sh<br />qsub Returns a <jobid><br />
  7. 7. Job priority<br />Can give priority with qsub<br />qsub –p 20 job.sh<br />Default priority is 0<br />U can give priorities from 0 to 1023 for a job<br />
  8. 8. Job dependencies<br />Run a job after another job successfully ends<br />echo “vflush” | qsub -W depend=afterok:10.penguin7.orchesys.com -p 10 -q flush_queue<br />Here ‘10.penguin7.orchesys.com’ is jobid of another job which has to complete successfully only then the current job is launched.<br />
  9. 9. Job Queues<br />Batch systems are usually configured with multiple queues.<br />Each queue can be configured to accept job from a certain group of users, or within specified resource limits<br />Queue selection is performed with -q queuename on the qsubcommand line<br />Glassbeam has default queue (batch) and flush_queue (where only one job can run at a time)<br />
  10. 10. Job Monitoring<br />For a job id, u can see the command that was fired for the job in the file<br />/var/spool/torque/server_priv/jobs/<JOBID.SC><br />sudo cat 90.localhost.localdomain.SC<br />/home/gbprod/testscript_aruba/aruba_parallel_loader qa0 1306219430 aruba_test_pod /glassbeam/core/bin<br />qstat – status of all submitted jobs <br />Status of only one job - qstat <jobid><br />Only running jobs - qstat –r<br />Email alert for jobs - qsub -m ae -M santosh@glassbeam.com (Send email in case of a – abort, e – end of job)<br />
  11. 11. Job accounting …<br />Can give job return status, how much time and <br />show what happened today to job id<br />Tracejob <jobid><br />tracejob -n d <jobid> (search last d days for the job), <br />fast version of tracejob: tracejob -f error -f system -f admin -f security -f sched -f debug -f debug2 -f job -f job_usage 114.localhost<br />
  12. 12. Job accounting<br />Tracejob output<br />Job: 114.localhost.localdomain<br />05/30/2011 05:25:15 A queue=batch<br />05/30/2011 05:25:15 A user=gbprod group=glassbeamjobname=STDIN queue=batch<br />ctime=1306747515 qtime=1306747515 etime=1306747515<br /> start=1306747515 owner=gbprod@localhost.localdomain<br />exec_host=localhost/0 Resource_List.neednodes=1<br />Resource_List.nodect=1 Resource_List.nodes=1<br />05/30/2011 05:25:25 A user=gbprod group=glassbeamjobname=STDIN queue=batch<br />ctime=1306747515 qtime=1306747515 etime=1306747515<br /> start=1306747515 owner=gbprod@localhost.localdomain<br />exec_host=localhost/0 Resource_List.neednodes=1<br />Resource_List.nodect=1 Resource_List.nodes=1<br /> session=26992 end=1306747525 Exit_status=0<br />resources_used.cput=00:00:00 resources_used.mem=0kb<br />resources_used.vmem=0kb<br />resources_used.walltime=00:00:10<br />
  13. 13. Install<br />Torque install<br />As root user<br />Go to folder install/torque-gb-3.0.1<br />Run command:<br />./torque.setupgbprodlocalhost<br />Maui install<br />As root user <br />Go to folder install/maui-gb-3.3.1<br />Run command<br />shinstall.sh<br />

×