LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
LAB MEETING—TECHNICAL TALK
GNU PARALLEL
O. TANGE, “GNU PARALLEL - THE COMMAND-LINE
POWER TOOL”, ;login: The USENIX Magazine, VOL. 36, NO.
1, PP. 42–47, FEB. 2011
Coby Viner
Hoffman Lab
Wednesday, April 13, 2016
LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
OVERVIEW
WHY USE GNU PARALLEL?
BASIC EXAMPLES FROM THE TUTORIAL
BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL]
MUCH MORE SYNTAX FOR MANY OTHER TASKS
MORE TUTORIAL EXAMPLES
SOME EXAMPLES OF MY GNU PARALLEL USAGE
LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
WHY USE GNU PARALLEL?
a shell tool for executing jobs in parallel using one
or more computers.
LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
WHY USE GNU PARALLEL?
a shell tool for executing jobs in parallel using one
or more computers.
Easily parallelize perfectly parallel tasks
LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
WHY USE GNU PARALLEL?
a shell tool for executing jobs in parallel using one
or more computers.
Easily parallelize perfectly parallel tasks
For each chromosome. . .
LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
WHY USE GNU PARALLEL?
a shell tool for executing jobs in parallel using one
or more computers.
Easily parallelize perfectly parallel tasks
For each chromosome. . .
For each sex, for each technical replicate, for each
hyper-parameter(s)
LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
WHY USE GNU PARALLEL?
a shell tool for executing jobs in parallel using one
or more computers.
Easily parallelize perfectly parallel tasks
For each chromosome. . .
For each sex, for each technical replicate, for each
hyper-parameter(s)
Job submission scripts within a for loop
LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
WHY USE GNU PARALLEL?
a shell tool for executing jobs in parallel using one
or more computers.
Easily parallelize perfectly parallel tasks
For each chromosome. . .
For each sex, for each technical replicate, for each
hyper-parameter(s)
Job submission scripts within a for loop
Improved, cleaner, syntax (for the programmer), even in
serial
LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
WHY USE GNU PARALLEL?
a shell tool for executing jobs in parallel using one
or more computers.
Easily parallelize perfectly parallel tasks
For each chromosome. . .
For each sex, for each technical replicate, for each
hyper-parameter(s)
Job submission scripts within a for loop
Improved, cleaner, syntax (for the programmer), even in
serial
Facile interleaving of tasks, in the order one is thinking
about them
A BASIC [MAN PAGE] EXAMPLE: “WORKING
AS XARGS -N1. ARGUMENT APPENDING”
find . -name '*.html' | parallel gzip --best
A BASIC [MAN PAGE] EXAMPLE: “WORKING
AS XARGS -N1. ARGUMENT APPENDING”
find . -name '*.html' | parallel gzip --best
find . -type f -print0 | 
parallel -q0 perl -i -pe 's/FOO BAR/FUBAR/g'
LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
EASY INSTALLATION FROM SOURCE
LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
EASY INSTALLATION FROM SOURCE
LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
EASY INSTALLATION FROM SOURCE
LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
EASY INSTALLATION FROM SOURCE
LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
EASY INSTALLATION FROM SOURCE
ANOTHER BASIC [MAN PAGE] EXAMPLE:
“INSERTING MULTIPLE ARGUMENTS”
bash: /bin/mv: Argument list too long
ls | grep -E '.log$' | parallel mv {} destdir
ANOTHER BASIC [MAN PAGE] EXAMPLE:
“INSERTING MULTIPLE ARGUMENTS”
bash: /bin/mv: Argument list too long
ls | grep -E '.log$' | parallel mv {} destdir
ls | grep -E '.log$' | parallel -m mv {} destdir
BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Input:
parallel echo ::: A B C # command line
cat abc-file | parallel echo # from STDIN
parallel -a abc-file echo # from a file
BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Input:
parallel echo ::: A B C # command line
cat abc-file | parallel echo # from STDIN
parallel -a abc-file echo # from a file
Output [line order may vary]:
A
B
C
BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Multiple inputs.
Input:
parallel echo ::: A B C ::: D E F
cat abc-file | parallel -a - -a def-file echo
parallel -a abc-file -a def-file echo
cat abc-file | parallel echo :::: - def-file # alt. file
parallel echo ::: A B C :::: def-file # mix cmd. and file
BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Multiple inputs.
Input:
parallel echo ::: A B C ::: D E F
cat abc-file | parallel -a - -a def-file echo
parallel -a abc-file -a def-file echo
cat abc-file | parallel echo :::: - def-file # alt. file
parallel echo ::: A B C :::: def-file # mix cmd. and file
Output [line order may vary]:
A D
A E
A F
B D
B E
B F
C D
C E
C F
BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Matching input.
Input:
parallel --xapply echo ::: A B C ::: D E F
BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Matching input.
Input:
parallel --xapply echo ::: A B C ::: D E F
Output [line order may vary]:
A D
B E
C F
BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Matching input.
Input:
parallel --xapply echo ::: A B C ::: D E F
Output [line order may vary]:
A D
B E
C F
-xapply will wrap, if insufficient input is provided.
BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Replacement strings: The 7 predefined replacement strings
Input:
parallel echo {} ::: A/B.C
parallel echo {.} ::: A/B.C
Output:
A/B.C
A/B
BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Replacement strings: The 7 predefined replacement strings
Input:
parallel echo {} ::: A/B.C
parallel echo {.} ::: A/B.C
Output:
A/B.C
A/B
Rep. String Result
. remove ext.
/ remove path
// only path
/. only ext. and path
# job number
% job slot number
BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Customizing replacement strings
-extensionreplace to change {.} etc.
Shorthand custom (PCRE+) replacement strings
GNU parallel’s 7 replacement strings:
--rpl '{} '
--rpl '{#} $_=$job->seq()'
--rpl '{%} $_=$job->slot()'
--rpl '{/} s:.*/::'
--rpl '{//} $Global::use{"File::Basename"} 
||= eval "use File::Basename; 1;"; $_ = dirname($_);'
--rpl '{/.} s:.*/::; s:.[^/.]+$::;'
--rpl '{.} s:.[^/.]+$::'
BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Multiple input sources and positional replacement:
parallel echo {1} and {2} ::: A B ::: C D
BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Multiple input sources and positional replacement:
parallel echo {1} and {2} ::: A B ::: C D
Always try to define replacements, with {<>} syntax.
BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Multiple input sources and positional replacement:
parallel echo {1} and {2} ::: A B ::: C D
Always try to define replacements, with {<>} syntax.
Test with --dry-run first.
MUCH MORE SYNTAX FOR MANY OTHER TASKS
--pipe: instead of STDIN as command args, data sent to
STDIN of command
MUCH MORE SYNTAX FOR MANY OTHER TASKS
--pipe: instead of STDIN as command args, data sent to
STDIN of command
command_A | command_B | command_C, where
command_B is slow
MUCH MORE SYNTAX FOR MANY OTHER TASKS
--pipe: instead of STDIN as command args, data sent to
STDIN of command
command_A | command_B | command_C, where
command_B is slow
Remote execution to directly parallelize over multiple
machines
MUCH MORE SYNTAX FOR MANY OTHER TASKS
--pipe: instead of STDIN as command args, data sent to
STDIN of command
command_A | command_B | command_C, where
command_B is slow
Remote execution to directly parallelize over multiple
machines
Working directly with a SQL database
MUCH MORE SYNTAX FOR MANY OTHER TASKS
--pipe: instead of STDIN as command args, data sent to
STDIN of command
command_A | command_B | command_C, where
command_B is slow
Remote execution to directly parallelize over multiple
machines
Working directly with a SQL database
Shebang: often cat input_file | parallel command,
but can do #!/usr/bin/parallel --shebang -r echo
MUCH MORE SYNTAX FOR MANY OTHER TASKS
--pipe: instead of STDIN as command args, data sent to
STDIN of command
command_A | command_B | command_C, where
command_B is slow
Remote execution to directly parallelize over multiple
machines
Working directly with a SQL database
Shebang: often cat input_file | parallel command,
but can do #!/usr/bin/parallel --shebang -r echo
As a counting semaphore: parallel --semaphore or sem
MUCH MORE SYNTAX FOR MANY OTHER TASKS
--pipe: instead of STDIN as command args, data sent to
STDIN of command
command_A | command_B | command_C, where
command_B is slow
Remote execution to directly parallelize over multiple
machines
Working directly with a SQL database
Shebang: often cat input_file | parallel command,
but can do #!/usr/bin/parallel --shebang -r echo
As a counting semaphore: parallel --semaphore or sem
Default is one slot: a mutex
ANOTHER [MAN PAGE] EXAMPLE:
“AGGREGATING CONTENT OF FILES”
parallel --header : echo x{X}y{Y}z{Z} > 
x{X}y{Y}z{Z} 
::: X {1..5} ::: Y {01..10} ::: Z {1..5}
ANOTHER [MAN PAGE] EXAMPLE:
“AGGREGATING CONTENT OF FILES”
parallel --header : echo x{X}y{Y}z{Z} > 
x{X}y{Y}z{Z} 
::: X {1..5} ::: Y {01..10} ::: Z {1..5}
parallel eval 'cat {=s/y01/y*/=} > 
{=s/y01//=}' ::: *y01*
This runs: cat x1y*z1 > x1z1, ∀x∀z
POST-MEME2IMAGES INKSCAPE
CONVERSIONS FOR PUBLICATION-READY
CENTRIMO PLOTS AND SEQUENCE LOGOS
parallel inkscape --vacuum-defs --export-pdf={.}.pdf {} 
::: "$centrimo_eps_1" "$centrimo_eps_2"
POST-MEME2IMAGES INKSCAPE
CONVERSIONS FOR PUBLICATION-READY
CENTRIMO PLOTS AND SEQUENCE LOGOS
parallel inkscape --vacuum-defs --export-pdf={.}.pdf {} 
::: "$centrimo_eps_1" "$centrimo_eps_2"
parallel "inkscape --vacuum-defs --export-pdf={.}.pdf {};
pdfcrop --hires --clip --margins '0 0 0 -12' {.}.pdf;
mv -f {.}-crop.pdf {.}.pdf
" ::: logo+([:digit:])$VECTOR_FILE_EXT
FIXING DIRECTORY STRUCTURES—SYMBOLIC
LINK ISSUES (FOR DATA PROVENANCE)
parallel --dry-run -j 1 --rpl 
'{s} s@.*?((?:fe)?male_d+-d+).*@$1@' "{s}; 
ln -s /$(readlink {}) {}" 
::: $(find . -mindepth 3 -maxdepth 3 -xtype l)
FIXING DIRECTORY STRUCTURES—SYMBOLIC
LINK ISSUES (FOR DATA PROVENANCE)
parallel --dry-run -j 1 --rpl 
'{s} s@.*?((?:fe)?male_d+-d+).*@$1@' "{s}; 
ln -s /$(readlink {}) {}" 
::: $(find . -mindepth 3 -maxdepth 3 -xtype l)
parallel --rpl 
'{s} s:.+?/(.+?)_peaks.narrowPeak.gz$:
1_summits.bed.gz:' 
"ln -s ../../linked-2015-10-07-.../data/MACS/{s}
{//}/" ::: */*_peaks.narrowPeak.gz
FIXING DIRECTORY STRUCTURES—SYMBOLIC
LINK ISSUES (FOR DATA PROVENANCE)
parallel --dry-run -j 1 --rpl 
'{s} s@.*?((?:fe)?male_d+-d+).*@$1@' "{s}; 
ln -s /$(readlink {}) {}" 
::: $(find . -mindepth 3 -maxdepth 3 -xtype l)
parallel --rpl 
'{s} s:.+?/(.+?)_peaks.narrowPeak.gz$:
1_summits.bed.gz:' 
"ln -s ../../linked-2015-10-07-.../data/MACS/{s}
{//}/" ::: */*_peaks.narrowPeak.gz
parallel -j 1 --rpl '{...} s:/.*::;' 
"dir=$(readlink -f {} | 
sed -r 's:/linked.+?/:/{...}/:'); 
mkdir $dir; rm -f {}; ln -s $dir {//}/M-ChIP_runs" 
::: $(find linked-2016-01-31-* -type l -name 'M-ChIP_runs')
EXPLORING/COLLATING COMPLEX CENTRIMO
RESULTS
parallel --dry-run -j 1 
--rpl '{sex} s:.*?(w*male)-d+_d+.*:1:' 
--rpl '{rep} s:.*?male-(d+_d+).*:1:' 
--rpl "{TFinfo} s:.*?([^/]+)-expandedTo500bpRegions-
mod.*:1:" 
--rpl '{thresh} s:.*?d+_d+-(0.d+).*:1:' 
"awk '$0 !~ /^#/ {$1=""; $2=""; 
print "{TFinfo}","{sex}","{rep}","
{thresh}"$0;}' {} | 
sed -r 's/[[:space:]]+/t/g'" 
::: $(find ../MEME-ChIP_runs-initial_controls/ 
-mindepth 5 -wholename 
'*hypothesis_testing_selected_controlledVars/
centrimo_out/centrimo.txt' | head
)
PROCESSING CHIP-SEQ PEAK DATA WITH
MACS
parallel --rpl '{/..SRF} s:../w+[-.](SRFw*).*:$1:i;' 
macs14 callpeak -t {} -n {/..SRF} -g 'mm' 
-s 51 --bw 150 -S -p 0.0001 
::: ../*.alignment.mm8.bed.gz
PROCESSING CHIP-SEQ PEAK DATA WITH
MACS
parallel --rpl '{/..SRF} s:../w+[-.](SRFw*).*:$1:i;' 
macs14 callpeak -t {} -n {/..SRF} -g 'mm' 
-s 51 --bw 150 -S -p 0.0001 
::: ../*.alignment.mm8.bed.gz
parallel "zcat {} | awk 'BEGIN{FS=OFS="t"} NR > 1 
{print $2,$3,$4;}' | 
pigz -9 > {/.}.bed.gz" ::: ../*MACS_peaks_annot.txt.gz
liftoverAll '.bed.gz'
PROCESSING CHIP-SEQ PEAK DATA WITH
MACS
function liftoverAll {
parallel liftOver {} "$LIFTOVER_CHAIN_FILE_FULL_PATH" 
../$LIFTED_OVER_DIR_NAME/{/.}.liftedmm9 
../$LIFTED_OVER_DIR_NAME/{/.}.unlifted 
::: *"$1"
pigz -9 ../$LIFTED_OVER_DIR_NAME/*.liftedmm9 
../$LIFTED_OVER_DIR_NAME/*.unlifted
}
PROCESSING CHIP-SEQ PEAK DATA WITH
MACS
function liftoverAll {
parallel liftOver {} "$LIFTOVER_CHAIN_FILE_FULL_PATH" 
../$LIFTED_OVER_DIR_NAME/{/.}.liftedmm9 
../$LIFTED_OVER_DIR_NAME/{/.}.unlifted 
::: *"$1"
pigz -9 ../$LIFTED_OVER_DIR_NAME/*.liftedmm9 
../$LIFTED_OVER_DIR_NAME/*.unlifted
}
parallel -j ${NSLOTS:=1} --xapply 
--rpl '{r} s:.*RS(d+).*:1:' 
"$MACS_CMD_AND_COMMON_PARAMS -f BAMPE -n 'M-r{1r}' 
-t {1} -c {2} |& tee -a '$OUT_DIR/M-r{1r}.log'" 
::: $IN_DIR/1494*@(1|2|3).bam 
::: $IN_DIR/1494*@(4|5|6).bam
PIPELINE—PROCESSING BISULFITE
SEQUENCING DATA WITH METHPIPE
merge_methcount_cmds=$(
parallel -j $NSLOTS --joblog "x.log" 
--rpl '{-../} s:.*/::; s:(.[^.]+)+$::; s:-d+$::;' 
--dry-run 
"echo "$MODULE_LOAD_CMD export LC_ALL=C;
cat $ALIGNED_DIR/{-../}*.tomr | 
sort -k 1,1 -k 2,2n -k 3,3n -k 6,6 | ldots | 
methcounts -v -c $BISMARK_REF 
-o $COUNTS_DIR/{-../}_pool_ALL.meth /dev/stdin" 
| tee -a /dev/stderr | qsub ldots 
::: $IN_DIR/*.1.fastq.gz | sort -V | uniq
)

GNU Parallel

  • 1.
    LAB MEETING— TECHNICAL TALK COBY VINER USECASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES LAB MEETING—TECHNICAL TALK GNU PARALLEL O. TANGE, “GNU PARALLEL - THE COMMAND-LINE POWER TOOL”, ;login: The USENIX Magazine, VOL. 36, NO. 1, PP. 42–47, FEB. 2011 Coby Viner Hoffman Lab Wednesday, April 13, 2016
  • 2.
    LAB MEETING— TECHNICAL TALK COBY VINER USECASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES OVERVIEW WHY USE GNU PARALLEL? BASIC EXAMPLES FROM THE TUTORIAL BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL] MUCH MORE SYNTAX FOR MANY OTHER TASKS MORE TUTORIAL EXAMPLES SOME EXAMPLES OF MY GNU PARALLEL USAGE
  • 3.
    LAB MEETING— TECHNICAL TALK COBY VINER USECASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES WHY USE GNU PARALLEL? a shell tool for executing jobs in parallel using one or more computers.
  • 4.
    LAB MEETING— TECHNICAL TALK COBY VINER USECASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES WHY USE GNU PARALLEL? a shell tool for executing jobs in parallel using one or more computers. Easily parallelize perfectly parallel tasks
  • 5.
    LAB MEETING— TECHNICAL TALK COBY VINER USECASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES WHY USE GNU PARALLEL? a shell tool for executing jobs in parallel using one or more computers. Easily parallelize perfectly parallel tasks For each chromosome. . .
  • 6.
    LAB MEETING— TECHNICAL TALK COBY VINER USECASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES WHY USE GNU PARALLEL? a shell tool for executing jobs in parallel using one or more computers. Easily parallelize perfectly parallel tasks For each chromosome. . . For each sex, for each technical replicate, for each hyper-parameter(s)
  • 7.
    LAB MEETING— TECHNICAL TALK COBY VINER USECASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES WHY USE GNU PARALLEL? a shell tool for executing jobs in parallel using one or more computers. Easily parallelize perfectly parallel tasks For each chromosome. . . For each sex, for each technical replicate, for each hyper-parameter(s) Job submission scripts within a for loop
  • 8.
    LAB MEETING— TECHNICAL TALK COBY VINER USECASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES WHY USE GNU PARALLEL? a shell tool for executing jobs in parallel using one or more computers. Easily parallelize perfectly parallel tasks For each chromosome. . . For each sex, for each technical replicate, for each hyper-parameter(s) Job submission scripts within a for loop Improved, cleaner, syntax (for the programmer), even in serial
  • 9.
    LAB MEETING— TECHNICAL TALK COBY VINER USECASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES WHY USE GNU PARALLEL? a shell tool for executing jobs in parallel using one or more computers. Easily parallelize perfectly parallel tasks For each chromosome. . . For each sex, for each technical replicate, for each hyper-parameter(s) Job submission scripts within a for loop Improved, cleaner, syntax (for the programmer), even in serial Facile interleaving of tasks, in the order one is thinking about them
  • 10.
    A BASIC [MANPAGE] EXAMPLE: “WORKING AS XARGS -N1. ARGUMENT APPENDING” find . -name '*.html' | parallel gzip --best
  • 11.
    A BASIC [MANPAGE] EXAMPLE: “WORKING AS XARGS -N1. ARGUMENT APPENDING” find . -name '*.html' | parallel gzip --best find . -type f -print0 | parallel -q0 perl -i -pe 's/FOO BAR/FUBAR/g'
  • 12.
    LAB MEETING— TECHNICAL TALK COBY VINER USECASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES EASY INSTALLATION FROM SOURCE
  • 13.
    LAB MEETING— TECHNICAL TALK COBY VINER USECASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES EASY INSTALLATION FROM SOURCE
  • 14.
    LAB MEETING— TECHNICAL TALK COBY VINER USECASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES EASY INSTALLATION FROM SOURCE
  • 15.
    LAB MEETING— TECHNICAL TALK COBY VINER USECASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES EASY INSTALLATION FROM SOURCE
  • 16.
    LAB MEETING— TECHNICAL TALK COBY VINER USECASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES EASY INSTALLATION FROM SOURCE
  • 17.
    ANOTHER BASIC [MANPAGE] EXAMPLE: “INSERTING MULTIPLE ARGUMENTS” bash: /bin/mv: Argument list too long ls | grep -E '.log$' | parallel mv {} destdir
  • 18.
    ANOTHER BASIC [MANPAGE] EXAMPLE: “INSERTING MULTIPLE ARGUMENTS” bash: /bin/mv: Argument list too long ls | grep -E '.log$' | parallel mv {} destdir ls | grep -E '.log$' | parallel -m mv {} destdir
  • 19.
    BASIC ELEMENTS OFSYNTAX [FROM THE TUTORIAL] Input: parallel echo ::: A B C # command line cat abc-file | parallel echo # from STDIN parallel -a abc-file echo # from a file
  • 20.
    BASIC ELEMENTS OFSYNTAX [FROM THE TUTORIAL] Input: parallel echo ::: A B C # command line cat abc-file | parallel echo # from STDIN parallel -a abc-file echo # from a file Output [line order may vary]: A B C
  • 21.
    BASIC ELEMENTS OFSYNTAX [FROM THE TUTORIAL] Multiple inputs. Input: parallel echo ::: A B C ::: D E F cat abc-file | parallel -a - -a def-file echo parallel -a abc-file -a def-file echo cat abc-file | parallel echo :::: - def-file # alt. file parallel echo ::: A B C :::: def-file # mix cmd. and file
  • 22.
    BASIC ELEMENTS OFSYNTAX [FROM THE TUTORIAL] Multiple inputs. Input: parallel echo ::: A B C ::: D E F cat abc-file | parallel -a - -a def-file echo parallel -a abc-file -a def-file echo cat abc-file | parallel echo :::: - def-file # alt. file parallel echo ::: A B C :::: def-file # mix cmd. and file Output [line order may vary]: A D A E A F B D B E B F C D C E C F
  • 23.
    BASIC ELEMENTS OFSYNTAX [FROM THE TUTORIAL] Matching input. Input: parallel --xapply echo ::: A B C ::: D E F
  • 24.
    BASIC ELEMENTS OFSYNTAX [FROM THE TUTORIAL] Matching input. Input: parallel --xapply echo ::: A B C ::: D E F Output [line order may vary]: A D B E C F
  • 25.
    BASIC ELEMENTS OFSYNTAX [FROM THE TUTORIAL] Matching input. Input: parallel --xapply echo ::: A B C ::: D E F Output [line order may vary]: A D B E C F -xapply will wrap, if insufficient input is provided.
  • 26.
    BASIC ELEMENTS OFSYNTAX [FROM THE TUTORIAL] Replacement strings: The 7 predefined replacement strings Input: parallel echo {} ::: A/B.C parallel echo {.} ::: A/B.C Output: A/B.C A/B
  • 27.
    BASIC ELEMENTS OFSYNTAX [FROM THE TUTORIAL] Replacement strings: The 7 predefined replacement strings Input: parallel echo {} ::: A/B.C parallel echo {.} ::: A/B.C Output: A/B.C A/B Rep. String Result . remove ext. / remove path // only path /. only ext. and path # job number % job slot number
  • 28.
    BASIC ELEMENTS OFSYNTAX [FROM THE TUTORIAL] Customizing replacement strings -extensionreplace to change {.} etc. Shorthand custom (PCRE+) replacement strings GNU parallel’s 7 replacement strings: --rpl '{} ' --rpl '{#} $_=$job->seq()' --rpl '{%} $_=$job->slot()' --rpl '{/} s:.*/::' --rpl '{//} $Global::use{"File::Basename"} ||= eval "use File::Basename; 1;"; $_ = dirname($_);' --rpl '{/.} s:.*/::; s:.[^/.]+$::;' --rpl '{.} s:.[^/.]+$::'
  • 29.
    BASIC ELEMENTS OFSYNTAX [FROM THE TUTORIAL] Multiple input sources and positional replacement: parallel echo {1} and {2} ::: A B ::: C D
  • 30.
    BASIC ELEMENTS OFSYNTAX [FROM THE TUTORIAL] Multiple input sources and positional replacement: parallel echo {1} and {2} ::: A B ::: C D Always try to define replacements, with {<>} syntax.
  • 31.
    BASIC ELEMENTS OFSYNTAX [FROM THE TUTORIAL] Multiple input sources and positional replacement: parallel echo {1} and {2} ::: A B ::: C D Always try to define replacements, with {<>} syntax. Test with --dry-run first.
  • 32.
    MUCH MORE SYNTAXFOR MANY OTHER TASKS --pipe: instead of STDIN as command args, data sent to STDIN of command
  • 33.
    MUCH MORE SYNTAXFOR MANY OTHER TASKS --pipe: instead of STDIN as command args, data sent to STDIN of command command_A | command_B | command_C, where command_B is slow
  • 34.
    MUCH MORE SYNTAXFOR MANY OTHER TASKS --pipe: instead of STDIN as command args, data sent to STDIN of command command_A | command_B | command_C, where command_B is slow Remote execution to directly parallelize over multiple machines
  • 35.
    MUCH MORE SYNTAXFOR MANY OTHER TASKS --pipe: instead of STDIN as command args, data sent to STDIN of command command_A | command_B | command_C, where command_B is slow Remote execution to directly parallelize over multiple machines Working directly with a SQL database
  • 36.
    MUCH MORE SYNTAXFOR MANY OTHER TASKS --pipe: instead of STDIN as command args, data sent to STDIN of command command_A | command_B | command_C, where command_B is slow Remote execution to directly parallelize over multiple machines Working directly with a SQL database Shebang: often cat input_file | parallel command, but can do #!/usr/bin/parallel --shebang -r echo
  • 37.
    MUCH MORE SYNTAXFOR MANY OTHER TASKS --pipe: instead of STDIN as command args, data sent to STDIN of command command_A | command_B | command_C, where command_B is slow Remote execution to directly parallelize over multiple machines Working directly with a SQL database Shebang: often cat input_file | parallel command, but can do #!/usr/bin/parallel --shebang -r echo As a counting semaphore: parallel --semaphore or sem
  • 38.
    MUCH MORE SYNTAXFOR MANY OTHER TASKS --pipe: instead of STDIN as command args, data sent to STDIN of command command_A | command_B | command_C, where command_B is slow Remote execution to directly parallelize over multiple machines Working directly with a SQL database Shebang: often cat input_file | parallel command, but can do #!/usr/bin/parallel --shebang -r echo As a counting semaphore: parallel --semaphore or sem Default is one slot: a mutex
  • 39.
    ANOTHER [MAN PAGE]EXAMPLE: “AGGREGATING CONTENT OF FILES” parallel --header : echo x{X}y{Y}z{Z} > x{X}y{Y}z{Z} ::: X {1..5} ::: Y {01..10} ::: Z {1..5}
  • 40.
    ANOTHER [MAN PAGE]EXAMPLE: “AGGREGATING CONTENT OF FILES” parallel --header : echo x{X}y{Y}z{Z} > x{X}y{Y}z{Z} ::: X {1..5} ::: Y {01..10} ::: Z {1..5} parallel eval 'cat {=s/y01/y*/=} > {=s/y01//=}' ::: *y01* This runs: cat x1y*z1 > x1z1, ∀x∀z
  • 41.
    POST-MEME2IMAGES INKSCAPE CONVERSIONS FORPUBLICATION-READY CENTRIMO PLOTS AND SEQUENCE LOGOS parallel inkscape --vacuum-defs --export-pdf={.}.pdf {} ::: "$centrimo_eps_1" "$centrimo_eps_2"
  • 42.
    POST-MEME2IMAGES INKSCAPE CONVERSIONS FORPUBLICATION-READY CENTRIMO PLOTS AND SEQUENCE LOGOS parallel inkscape --vacuum-defs --export-pdf={.}.pdf {} ::: "$centrimo_eps_1" "$centrimo_eps_2" parallel "inkscape --vacuum-defs --export-pdf={.}.pdf {}; pdfcrop --hires --clip --margins '0 0 0 -12' {.}.pdf; mv -f {.}-crop.pdf {.}.pdf " ::: logo+([:digit:])$VECTOR_FILE_EXT
  • 43.
    FIXING DIRECTORY STRUCTURES—SYMBOLIC LINKISSUES (FOR DATA PROVENANCE) parallel --dry-run -j 1 --rpl '{s} s@.*?((?:fe)?male_d+-d+).*@$1@' "{s}; ln -s /$(readlink {}) {}" ::: $(find . -mindepth 3 -maxdepth 3 -xtype l)
  • 44.
    FIXING DIRECTORY STRUCTURES—SYMBOLIC LINKISSUES (FOR DATA PROVENANCE) parallel --dry-run -j 1 --rpl '{s} s@.*?((?:fe)?male_d+-d+).*@$1@' "{s}; ln -s /$(readlink {}) {}" ::: $(find . -mindepth 3 -maxdepth 3 -xtype l) parallel --rpl '{s} s:.+?/(.+?)_peaks.narrowPeak.gz$: 1_summits.bed.gz:' "ln -s ../../linked-2015-10-07-.../data/MACS/{s} {//}/" ::: */*_peaks.narrowPeak.gz
  • 45.
    FIXING DIRECTORY STRUCTURES—SYMBOLIC LINKISSUES (FOR DATA PROVENANCE) parallel --dry-run -j 1 --rpl '{s} s@.*?((?:fe)?male_d+-d+).*@$1@' "{s}; ln -s /$(readlink {}) {}" ::: $(find . -mindepth 3 -maxdepth 3 -xtype l) parallel --rpl '{s} s:.+?/(.+?)_peaks.narrowPeak.gz$: 1_summits.bed.gz:' "ln -s ../../linked-2015-10-07-.../data/MACS/{s} {//}/" ::: */*_peaks.narrowPeak.gz parallel -j 1 --rpl '{...} s:/.*::;' "dir=$(readlink -f {} | sed -r 's:/linked.+?/:/{...}/:'); mkdir $dir; rm -f {}; ln -s $dir {//}/M-ChIP_runs" ::: $(find linked-2016-01-31-* -type l -name 'M-ChIP_runs')
  • 46.
    EXPLORING/COLLATING COMPLEX CENTRIMO RESULTS parallel--dry-run -j 1 --rpl '{sex} s:.*?(w*male)-d+_d+.*:1:' --rpl '{rep} s:.*?male-(d+_d+).*:1:' --rpl "{TFinfo} s:.*?([^/]+)-expandedTo500bpRegions- mod.*:1:" --rpl '{thresh} s:.*?d+_d+-(0.d+).*:1:' "awk '$0 !~ /^#/ {$1=""; $2=""; print "{TFinfo}","{sex}","{rep}"," {thresh}"$0;}' {} | sed -r 's/[[:space:]]+/t/g'" ::: $(find ../MEME-ChIP_runs-initial_controls/ -mindepth 5 -wholename '*hypothesis_testing_selected_controlledVars/ centrimo_out/centrimo.txt' | head )
  • 47.
    PROCESSING CHIP-SEQ PEAKDATA WITH MACS parallel --rpl '{/..SRF} s:../w+[-.](SRFw*).*:$1:i;' macs14 callpeak -t {} -n {/..SRF} -g 'mm' -s 51 --bw 150 -S -p 0.0001 ::: ../*.alignment.mm8.bed.gz
  • 48.
    PROCESSING CHIP-SEQ PEAKDATA WITH MACS parallel --rpl '{/..SRF} s:../w+[-.](SRFw*).*:$1:i;' macs14 callpeak -t {} -n {/..SRF} -g 'mm' -s 51 --bw 150 -S -p 0.0001 ::: ../*.alignment.mm8.bed.gz parallel "zcat {} | awk 'BEGIN{FS=OFS="t"} NR > 1 {print $2,$3,$4;}' | pigz -9 > {/.}.bed.gz" ::: ../*MACS_peaks_annot.txt.gz liftoverAll '.bed.gz'
  • 49.
    PROCESSING CHIP-SEQ PEAKDATA WITH MACS function liftoverAll { parallel liftOver {} "$LIFTOVER_CHAIN_FILE_FULL_PATH" ../$LIFTED_OVER_DIR_NAME/{/.}.liftedmm9 ../$LIFTED_OVER_DIR_NAME/{/.}.unlifted ::: *"$1" pigz -9 ../$LIFTED_OVER_DIR_NAME/*.liftedmm9 ../$LIFTED_OVER_DIR_NAME/*.unlifted }
  • 50.
    PROCESSING CHIP-SEQ PEAKDATA WITH MACS function liftoverAll { parallel liftOver {} "$LIFTOVER_CHAIN_FILE_FULL_PATH" ../$LIFTED_OVER_DIR_NAME/{/.}.liftedmm9 ../$LIFTED_OVER_DIR_NAME/{/.}.unlifted ::: *"$1" pigz -9 ../$LIFTED_OVER_DIR_NAME/*.liftedmm9 ../$LIFTED_OVER_DIR_NAME/*.unlifted } parallel -j ${NSLOTS:=1} --xapply --rpl '{r} s:.*RS(d+).*:1:' "$MACS_CMD_AND_COMMON_PARAMS -f BAMPE -n 'M-r{1r}' -t {1} -c {2} |& tee -a '$OUT_DIR/M-r{1r}.log'" ::: $IN_DIR/1494*@(1|2|3).bam ::: $IN_DIR/1494*@(4|5|6).bam
  • 51.
    PIPELINE—PROCESSING BISULFITE SEQUENCING DATAWITH METHPIPE merge_methcount_cmds=$( parallel -j $NSLOTS --joblog "x.log" --rpl '{-../} s:.*/::; s:(.[^.]+)+$::; s:-d+$::;' --dry-run "echo "$MODULE_LOAD_CMD export LC_ALL=C; cat $ALIGNED_DIR/{-../}*.tomr | sort -k 1,1 -k 2,2n -k 3,3n -k 6,6 | ldots | methcounts -v -c $BISMARK_REF -o $COUNTS_DIR/{-../}_pool_ALL.meth /dev/stdin" | tee -a /dev/stderr | qsub ldots ::: $IN_DIR/*.1.fastq.gz | sort -V | uniq )