Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
GNU Parallel: Lab meeting—technical talk
1. Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Lab meeting—technical talk
GNU Parallel
Coby Viner
Hoffman Lab
Thursday December 7, 2023
2. Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Overview
Why use GNU Parallel?
Basic examples from the tutorial
Basic elements of syntax [from the tutorial]
Much more syntax for many other tasks
Selected recent features
More tutorial examples
More tutorial examples
More tutorial examples
More tutorial examples
More tutorial examples
Some examples of my GNU parallel usage
3. Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Why use GNU Parallel?
a shell tool for executing jobs in parallel using one or more com-
puters.
4. Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Why use GNU Parallel?
a shell tool for executing jobs in parallel using one or more com-
puters.
I Easily parallelize perfectly parallel tasks
5. Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Why use GNU Parallel?
a shell tool for executing jobs in parallel using one or more com-
puters.
I Easily parallelize perfectly parallel tasks
I For each chromosome…
6. Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Why use GNU Parallel?
a shell tool for executing jobs in parallel using one or more com-
puters.
I Easily parallelize perfectly parallel tasks
I For each chromosome…
I For each sex, for each technical replicate, for each hyper-parameter(s)
7. Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Why use GNU Parallel?
a shell tool for executing jobs in parallel using one or more com-
puters.
I Easily parallelize perfectly parallel tasks
I For each chromosome…
I For each sex, for each technical replicate, for each hyper-parameter(s)
I Job submission scripts within a for loop
8. Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Why use GNU Parallel?
a shell tool for executing jobs in parallel using one or more com-
puters.
I Easily parallelize perfectly parallel tasks
I For each chromosome…
I For each sex, for each technical replicate, for each hyper-parameter(s)
I Job submission scripts within a for loop
I Improved, cleaner, syntax (for the programmer), even in serial
9. Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Why use GNU Parallel?
a shell tool for executing jobs in parallel using one or more com-
puters.
I Easily parallelize perfectly parallel tasks
I For each chromosome…
I For each sex, for each technical replicate, for each hyper-parameter(s)
I Job submission scripts within a for loop
I Improved, cleaner, syntax (for the programmer), even in serial
I Facile interleaving of tasks, in the order one is thinking about them
10. A basic [man page] example: “Working as xargs -n1.
Argument appending”
find . -name '*.html' | parallel gzip --best
12. Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Easy installation from source
13. Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Easy installation from source
14. Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Easy installation from source
15. Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Easy installation from source
16. Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Easy installation from source
17. Another basic [man page] example: “Inserting multiple
arguments”
bash: /bin/mv: Argument list too long
ls | grep -E '.log$' | parallel mv {} destdir
18. Another basic [man page] example: “Inserting multiple
arguments”
bash: /bin/mv: Argument list too long
ls | grep -E '.log$' | parallel mv {} destdir
ls | grep -E '.log$' | parallel -m mv {} destdir
19. Basic elements of syntax [from the tutorial]
Input:
parallel echo ::: A B C # command line
cat abc-file | parallel echo # from STDIN
parallel -a abc-file echo # from a file
20. Basic elements of syntax [from the tutorial]
Input:
parallel echo ::: A B C # command line
cat abc-file | parallel echo # from STDIN
parallel -a abc-file echo # from a file
Output [line order may vary]:
A
B
C
21. Basic elements of syntax [from the tutorial]
Multiple inputs.
Input:
parallel echo ::: A B C ::: D E F
cat abc-file | parallel -a - -a def-file echo
parallel -a abc-file -a def-file echo
cat abc-file | parallel echo :::: - def-file # alt. file
parallel echo ::: A B C :::: def-file # mix cmd. and file
22. Basic elements of syntax [from the tutorial]
Multiple inputs.
Input:
parallel echo ::: A B C ::: D E F
cat abc-file | parallel -a - -a def-file echo
parallel -a abc-file -a def-file echo
cat abc-file | parallel echo :::: - def-file # alt. file
parallel echo ::: A B C :::: def-file # mix cmd. and file
Output [line order may vary]:
A D
A E
A F
B D
B E
B F
C D
C E
C F
23. Basic elements of syntax [from the tutorial]
Matching input.
Input:
parallel --xapply echo ::: A B C ::: D E F
24. Basic elements of syntax [from the tutorial]
Matching input.
Input:
parallel --xapply echo ::: A B C ::: D E F
Output [line order may vary]:
A D
B E
C F
25. Basic elements of syntax [from the tutorial]
Matching input.
Input:
parallel --xapply echo ::: A B C ::: D E F
Output [line order may vary]:
A D
B E
C F
I –xapply will wrap, if insufficient input is provided.
26. Basic elements of syntax [from the tutorial]
Replacement strings: The 7 predefined replacement strings
Input:
parallel echo {} ::: A/B.C
parallel echo {.} ::: A/B.C
Output:
A/B.C
A/B
27. Basic elements of syntax [from the tutorial]
Replacement strings: The 7 predefined replacement strings
Input:
parallel echo {} ::: A/B.C
parallel echo {.} ::: A/B.C
Output:
A/B.C
A/B
Rep. String Result
. remove ext.
/ remove path
// only path
/. only ext. and path
# job number
% job slot number
28. Basic elements of syntax [from the tutorial]
Customizing replacement strings
--extensionreplace to change {.} etc.
Shorthand custom (PCRE+) replacement strings
GNU parallel’s 7 replacement strings:
--rpl '{} '
--rpl '{#} $_=$job->seq()'
--rpl '{%} $_=$job->slot()'
--rpl '{/} s:.*/::'
--rpl '{//} $Global::use{”File::Basename”}
||= eval ”use File::Basename; 1;”; $_ = dirname($_);'
--rpl '{/.} s:.*/::; s:.[^/.]+$::;'
--rpl '{.} s:.[^/.]+$::'
29. Basic elements of syntax [from the tutorial]
Multiple input sources and positional replacement:
parallel echo {1} and {2} ::: A B ::: C D
30. Basic elements of syntax [from the tutorial]
Multiple input sources and positional replacement:
parallel echo {1} and {2} ::: A B ::: C D
I Always try to define replacements, with {<>} syntax.
31. Basic elements of syntax [from the tutorial]
Multiple input sources and positional replacement:
parallel echo {1} and {2} ::: A B ::: C D
I Always try to define replacements, with {<>} syntax.
I Test with --dry-run first.
32. Basic elements of syntax [from the tutorial]
More replacement strings
--plus adds the replacement strings
{+/} {+.} {+..} {+...} {..} {...} {/..} {/...} {##}.
{+foo} matches the opposite of {foo}:
{} =
{+/}/{/} =
{.}.{+.} =
{+/}/{/.}.{+.} =
{..}.{+..} =
{+/}/{/..}.{+..} =
{...}.{+...} =
{+/}/{/...}.{+...}
33. Basic elements of syntax [from the tutorial]
--plus also adds:
I Since May 2021: now includes {%%regexp} and {##regexp}.
34. Basic elements of syntax [from the tutorial]
--plus also adds:
I Since May 2021: now includes {%%regexp} and {##regexp}.
I Since Dec. 2020, {hgrp} that gives the intersection of the hostgroups of
the job and the sshlogin that the job is run on.
35. Basic elements of syntax [from the tutorial]
--plus also adds:
I Since May 2021: now includes {%%regexp} and {##regexp}.
I Since Dec. 2020, {hgrp} that gives the intersection of the hostgroups of
the job and the sshlogin that the job is run on.
I Since May 2020: also activates the replacement strings
{slot} = $PARALLEL_JOBSLOT, {sshlogin} = $PARALLEL_SSHLOGIN, {host}.
36. Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Performance over time
20100424
20100615
20100620
20100822
20100922
20101115
20101202
20110122
20110205
20110422
20110622
20110822
20111122
20120122
20120322
20120522
20120722
20121022
20121222
20130222
20130522
20130722
20130922
20131122
20140122
20140322
20140522
20140722
20140922
20141122
20150222
20150422
20150622
20150822
20151022
20151222
20160222
20160422
20160622
20160822
20161022
20161222
20170222
20170422
20170622
20170822
20171022
20171222
20180222
20180422
20180622
20180822
20181022
20181222
20190222
20190422
20190622
20190822
20191022
20191222
20200222
20200422
20200622
20200822
20201022
5
6
7
8
9
10
11
12
GNU Parallel overhead for different versions
3000 trials each running 1000 jobs
Command
milliseconds/job
37. Much more syntax for many other tasks
I --pipe: instead of STDIN as command args, data sent to STDIN of
command
38. Much more syntax for many other tasks
I --pipe: instead of STDIN as command args, data sent to STDIN of
command
I command_A | command_B | command_C, where command_B is slow
39. Much more syntax for many other tasks
I --pipe: instead of STDIN as command args, data sent to STDIN of
command
I command_A | command_B | command_C, where command_B is slow
I Remote execution to directly parallelize over multiple machines
40. Much more syntax for many other tasks
I --pipe: instead of STDIN as command args, data sent to STDIN of
command
I command_A | command_B | command_C, where command_B is slow
I Remote execution to directly parallelize over multiple machines
I Working directly with a SQL database
41. Much more syntax for many other tasks
I --pipe: instead of STDIN as command args, data sent to STDIN of
command
I command_A | command_B | command_C, where command_B is slow
I Remote execution to directly parallelize over multiple machines
I Working directly with a SQL database
I Shebang: often cat input_file | parallel command, but can do
#!/usr/bin/parallel --shebang -r echo
42. Much more syntax for many other tasks
I --pipe: instead of STDIN as command args, data sent to STDIN of
command
I command_A | command_B | command_C, where command_B is slow
I Remote execution to directly parallelize over multiple machines
I Working directly with a SQL database
I Shebang: often cat input_file | parallel command, but can do
#!/usr/bin/parallel --shebang -r echo
I As a counting semaphore: parallel --semaphore or sem
43. Much more syntax for many other tasks
I --pipe: instead of STDIN as command args, data sent to STDIN of
command
I command_A | command_B | command_C, where command_B is slow
I Remote execution to directly parallelize over multiple machines
I Working directly with a SQL database
I Shebang: often cat input_file | parallel command, but can do
#!/usr/bin/parallel --shebang -r echo
I As a counting semaphore: parallel --semaphore or sem
I Default is one slot: a mutex
44. Selected recent features (post-2020)
I --latest-line shows only the latest line of running jobs.
45. Selected recent features (post-2020)
I --latest-line shows only the latest line of running jobs.
I --color colors output in different colors per job (and additional related
features).
46. Selected recent features (post-2020)
I --latest-line shows only the latest line of running jobs.
I --color colors output in different colors per job (and additional related
features).
I --sshlogin: now quite fully-featured
47. Selected recent features (post-2020)
I --latest-line shows only the latest line of running jobs.
I --color colors output in different colors per job (and additional related
features).
I --sshlogin: now quite fully-featured
I --delay 123auto will auto-adjust --delay. If jobs fail due to being
spawned too quickly, --delay will exponentially increase.
48. Selected recent features (post-2020)
I --latest-line shows only the latest line of running jobs.
I --color colors output in different colors per job (and additional related
features).
I --sshlogin: now quite fully-featured
I --delay 123auto will auto-adjust --delay. If jobs fail due to being
spawned too quickly, --delay will exponentially increase.
I --memsuspend
49. Selected recent features (post-2020)
I --latest-line shows only the latest line of running jobs.
I --color colors output in different colors per job (and additional related
features).
I --sshlogin: now quite fully-featured
I --delay 123auto will auto-adjust --delay. If jobs fail due to being
spawned too quickly, --delay will exponentially increase.
I --memsuspend
I {= =}: includes yyyy_mm_dd_hh_mm_ss(),
yyyy_mm_dd_hh_mm(), etc.
50. Selected recent features (post-2020)
I --latest-line shows only the latest line of running jobs.
I --color colors output in different colors per job (and additional related
features).
I --sshlogin: now quite fully-featured
I --delay 123auto will auto-adjust --delay. If jobs fail due to being
spawned too quickly, --delay will exponentially increase.
I --memsuspend
I {= =}: includes yyyy_mm_dd_hh_mm_ss(),
yyyy_mm_dd_hh_mm(), etc.
I --filter, e.g., {1} < {2}+1.
51. Selected recent features (post-2020)
I --latest-line shows only the latest line of running jobs.
I --color colors output in different colors per job (and additional related
features).
I --sshlogin: now quite fully-featured
I --delay 123auto will auto-adjust --delay. If jobs fail due to being
spawned too quickly, --delay will exponentially increase.
I --memsuspend
I {= =}: includes yyyy_mm_dd_hh_mm_ss(),
yyyy_mm_dd_hh_mm(), etc.
I --filter, e.g., {1} < {2}+1.
I --template <text file>, with replacement strings. Replaces the
replacement strings and saves it under a new filename.
52. Another [man page] example: “Aggregating content of files”
parallel --header : echo x{X}y{Y}z{Z} >
x{X}y{Y}z{Z}
::: X {1..5} ::: Y {01..10} ::: Z {1..5}
53. Another [man page] example: “Aggregating content of files”
parallel --header : echo x{X}y{Y}z{Z} >
x{X}y{Y}z{Z}
::: X {1..5} ::: Y {01..10} ::: Z {1..5}
parallel eval 'cat {=s/y01/y*/=} >
{=s/y01//=}' ::: *y01*
This runs: cat x1y*z1 > x1z1, ∀x∀z