FireWorks workflow software

FireWorks workflow software
MAVRL workshop | Nov 2014
Anubhav Jain
Energy & Environmental Technologies
Berkeley Lab

¡ There was no real
“system” for running jobs
¡ Everything was very
VASP specific
¡ No error detection /
failure recovery
¡ When there was a
mistake, it would take a
week of manual labor to
fix and rerun

¡ The first attempt was a horrible mash-up of
things we had already built
§ Complicated by having 2 people “in charge”
¡ Sometimes it is better to start from a blank
piece of paper with 1 leader

¡ #1 Google hit for “Python workflow software”
§ now even beats Adobe Fireworks for #1 spot for
“Fireworks workflow”!
¡ Won NERSC award for innovative use of HPC
¡ Used in many applications
§ genomics to computer graphics
§ this is not an “internal code” for running crystals
¡ Doc page ~200 hits/week
§ 1/10th of Materials Project

¡ What is FireWorks and why use it?
¡ Practical: learn to use FireWorks

calc1
restart
test_2
scp files/qsub
wait for finish
retry failures/copy
files/qsub again

calc1
restart
try_2
scp files/qsub
wait for finish
retry failures/copy
files/qsub again

LAUNCHPAD
ROCKET LAUNCHER /
QUEUE LAUNCHER
FW 1
FW 2
FW 3 FW 4
Directory 1 Directory 2

?
You
can
scale
without
human
effort
Easily
customize
what
gets
run
where

¡ Easy-to-install
§ FW currently at NERSC, SDSC, group clusters
– Blue Gene planned
¡ Work within the limits of queue policies
¡ Pack jobs automatically

¡ both job details (scripts+parameters) and
launch details are automatically stored
what
machine
what
time
what
directory
what
was
the
output
when
was
it
queued
when
did
it
start
running
when
was
it
completed
LAUNCH

¡ Soft failures, hard failures, human errors
¡ We’ve been through it many times now…
¡ No longer a week’s effort
§ “lpad detect_lostruns –rerun” OR
§ “lpad rerun –s FIZZLED”

Xiaohui
can
be
replaced
by
digital
Xiaohui,
programmed
into
FireWorks

¡ Submitting
millions of jobs
§ Easy to lose track
of what was done
before
¡ Multiple users
submitting jobs
¡ Sub-workflow
duplication
A
A
Duplicate Job
detection
(if two workflows contain an
identical step,
ensure that the step is only
run once and relevant
information is still passed)

¡ Within workflow, or between workflows
¡ Completely flexible

Now
seems
like
a
good
time
to
bring
up
the
last
few
lines
of
the
OUTCAR
of
all
failed
jobs...

¡ Ridiculous amount of
documentation and
tutorials
§ complete strangers are
experts w/o my help
§ but many grad students/
postdocs still complain w/o
reading the docs
¡ Built in tasks
§ run BASH/Python scripts
§ file transfer (incl. remote)
§ write/copy/delete files
¡ Paper in submission
§ happy to share preprint

FW 1 Spec
FireTask 1
FireTask 2
• Each FireWork is run in a separate
directory, maybe on a different machine,
within its own batch job (in queue mode)
• The spec contains parameters needed to
carry out FireTasks
• FireTasks are run in succession in the
same directory
• A FireWork can modify the Spec of its
children based on its output (pass
information) through a FWAction
• The FWAction can also modify the
workflow
FW 2 Spec
FireTask 1
FW 3 Spec
FWAction
FireTask 1
FireTask 2
FireTask 3
FWAction

input_array: [1, 2, 3]
1. Sum input array
2. Write to file
3. Pass result to next job
1. Sum input array
2. Write to file
6 15
input_data: [6, 15]
1. Sum input data
2. Write to file
-------------------------------------
1. Copy result to home dir

class
MyAdditionTask(FireTaskBase):
_fw_name
=
"My
Addition
Task"
def
run_task(self,
fw_spec):
1. Sum input array
2. Write to file
input_array
=
fw_spec['input_array']
m_sum
=
sum(input_array)
print("The
sum
of
{}
is:
{}".format(input_array,
m_sum))
with
open('my_sum.txt',
'a')
as
f:
f.writelines(str(m_sum)+'n')
#
store
the
sum;
push
the
sum
to
the
input
array
of
the
next
sum
return
FWAction(stored_data={'sum':
m_sum},
mod_spec=[{'_push':
{'input_array':
m_sum}}])
See
also:
http://pythonhosted.org/FireWorks/guide_to_writing_firetasks.html

1. Sum input array
2. Write to file
1. Sum input array
2. Write to file
6 15!
input_data: [6, 15]
1. Sum input data
2. Write to file
-------------------------------------
1. Copy result to home dir
#
set
up
the
LaunchPad
and
reset
it
launchpad
=
LaunchPad()
launchpad.reset('',
require_password=False)
#
create
Workflow
consisting
of
a
AdditionTask
FWs
+
file
transfer
fw1
=
Firework(MyAdditionTask(),
{"input_array":
[1,2,3]},
name="pt
1A")
fw2
=
Firework(MyAdditionTask(),
{"input_array":
[4,5,6]},
name="pt
1B")
fw3
=
Firework([MyAdditionTask(),
FileTransferTask({"mode":
"cp",
"files":
["my_sum.txt"],
"dest":
"~"})],
name="pt
2")
wf
=
Workflow([fw1,
fw2,
fw3],
{fw1:
fw3,
fw2:
fw3},
name="MAVRL
test")
launchpad.add_wf(wf)
#
launch
the
entire
Workflow
locally
rapidfire(launchpad,
FWorker())

¡ lpad get_wflows -d more
¡ lpad get_fws -i 3 -d all
¡ lpad webgui
¡ Also rerun features
See all reporting at official docs:
http://pythonhosted.org/FireWorks

¡ There are a ton in the documentation and
tutorials, just try them!
§ http://pythonhosted.org/FireWorks
¡ I want an example of running VASP!
§ https://github.com/materialsvirtuallab/fireworks-vasp
§ https://gist.github.com/computron/
▪ look for “fireworks-vasp_demo.py”
§ Note: demo is only a single VASP run
§ multiple VASP runs require passing directory names
between jobs
▪ currently you must do this manually
▪ in future, perhaps build into FireWorks

¡ It is not an accident that we are able to support so
many advanced features in such a short time
§ many features not found anywhere else!
¡ FireWorks is designed to:
§ leverage modern tools
§ be extensible at a fundamental level, not post-hoc
feature additions

(this is YAML, a bit prettier for humans
but less pretty for computers)
fws:
-‐
fw_id:
1
spec:
_tasks:
-‐
_fw_name:
ScriptTask:
script:
echo
'To
be,
or
not
to
be,’
-‐
fw_id:
2
spec:
_tasks:
-‐
_fw_name:
ScriptTask
script:
echo
'that
is
the
question:’
links:
1:
-‐
2
metadata:
{}
The
same
JSON
document
will
produce
the
same
result
on
any
computer
(with
the
same
Python
functions).

(this is YAML, a bit prettier for humans
but less pretty for computers)
fws:
-‐
fw_id:
1
spec:
_tasks:
-‐
_fw_name:
ScriptTask:
script:
echo
'To
be,
or
not
to
be,’
-‐
fw_id:
2
spec:
_tasks:
-‐
_fw_name:
ScriptTask
script:
echo
'that
is
the
question:’
links:
1:
-‐
2
metadata:
{}
Just some of your search
options:
• simple matches
• match in array
• greater than/less than
• regular expressions
• match subdocument
• Javascript function
• MapReduce…
All
for
free,
and
all
on
the
native
workflow
format!

Use
MongoDB’s
dictionary
update
language
to
allow
for
JSON
document
updates
Workflows
can
create
new
workflows
or
add
to
current
workflow
• a
recursive
workflow
• calculation
“detours”
• branches

¡ Theme: Worker machine pulls a job & runs it
¡ Variation 1:
§ different workers can be configured to pull different
types of jobs via config + MongoDB
¡ Variation 2:
§ worker machines sort the jobs by a priority key and
pull matching jobs the highest priority

Queue launcher
(running on Hopper head node)
thruput job
thruput job
thruput job
thruput job
thruput job
thruput job
thruput job

Job wakes up
when PBS runs it
¡ more complex queuing schemes also possible
§ it’s always the same pull and run, or a slight variation
on it!
Grabs the latest job
description from an
external DB (pull)
Runs the job based
on DB description

¡ Multiple processes pull and run jobs simultaneously
§ It is all the same thing, just sliced* different ways!
Query&Job&*>&&&job&A!!*>&update&DB&
mpirun&*>&Node&1%
Query&Job&*>&& &job&B!!*>&update&DB&&
mpirun&*>&Node&2%
Query&Job&*>&&&job&X&&*>&Update&DB&
mpirun&*>&Node&n%
Independent&Processes&
1!large!job!
mol&a%
mol&b%
mol&x%
*get
it?
wink
wink

because
jobs
are
JSON,
they
are
completely
serializable!

¡ When a job runs, a separate thread periodically
pings an “alive” signal to the database
¡ If that alive signal doesn’t appear for some time,
the job is dead
§ this method is robust for all types of failures
¡ The ping thread is reused to also track the output
files and report the results to the database

FireWorks workflow software

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to FireWorks workflow software

Similar to FireWorks workflow software (20)

More from Anubhav Jain

More from Anubhav Jain (20)

Recently uploaded

Recently uploaded (20)

FireWorks workflow software