A Guide to DAGMan

827 views
755 views

Published on

A brief guide to DAGMan

Published in: Career, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
827
On SlideShare
0
From Embeds
0
Number of Embeds
27
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A Guide to DAGMan

  1. 1. A Guide to the DAGMan (7.0) “Specification” Information provided by the folks at Condor WARNING!!! This presentation lacks images
  2. 2. DAGMan <ul><li>“ DAGMan (Directed Acyclic Graph Manager) is a meta-scheduler for Condor” </li></ul><ul><li>Manages dependencies between compute and data jobs at a high level </li></ul><ul><li>What this means to us? </li></ul><ul><li>Provides users a simple way to denote simple dependencies between jobs </li></ul>
  3. 3. An Example <ul><li># Filename: aBoringExample.dag </li></ul><ul><li>JOB A a.condor </li></ul><ul><li>JOB B b.condor </li></ul><ul><li>JOB C c.condor </li></ul><ul><li>JOB D d.condor </li></ul><ul><li>PARENT A CHILD B C </li></ul><ul><li>PARENT B C CHILD D </li></ul># Filename: a.condor Executable = foo Requirements = Memory >= 32 Meg Error = err.$(Process) Input = in.$(Process) Output = out.$(Process) Log = foo.log Queue 150
  4. 4. Nodes <ul><li>A node is composed of </li></ul><ul><ul><li>A “cluster” of compute or data jobs defined by one Condor or Stork description file respectively </li></ul></ul><ul><ul><ul><li>A group of executions defined by one queue command (i.e. 150 instances of the same program) </li></ul></ul></ul><ul><ul><li>(optionally) associated pre or post scripts </li></ul></ul><ul><li>Only one cluster can be defined per submit file for use with DAGMan </li></ul>
  5. 5. Directed Links <ul><li>Simple Dependencies </li></ul><ul><ul><li>Tells Condor that children nodes can not be executed until their parents are executed </li></ul></ul><ul><li>No complex relationships / dependencies can be given to DAGMan </li></ul>
  6. 6. Specification (the basics) <ul><li>JOB / DATA </li></ul><ul><li> {JOB | DATA} jobName jobDescFile.condor [DONE][DIR WD ] </li></ul><ul><li>SCRIPT </li></ul><ul><li>SCRIPT {PRE|POST} jobName scriptName.sh [ arguments ] </li></ul><ul><li>PARENT..CHILD </li></ul><ul><li>PARENT p1 [ p2 …] CHILD c1 [ c2 …] </li></ul><ul><li>RETRY </li></ul><ul><li>RETRY jobName numRetries [UNLESS-EXIT value ] </li></ul><ul><li>Others: priority, category, vars, maxjobs, abort-dag-on, config (see documentation or feel free to ask) </li></ul>
  7. 7. Other Features <ul><li>When DAG is submitted, a submit description file is produced </li></ul><ul><ul><li>Optionally use this file to build a hierarchy of dags (dags within dags) </li></ul></ul><ul><li>Can monitor watching myFile.dag.dagman.out </li></ul><ul><li>Job Recovery </li></ul><ul><ul><li>If failure, DAGMan produces a new “recover” dag </li></ul></ul><ul><ul><li>Can be used to restart DAG at nodes where failure occurred </li></ul></ul>

×