Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Lab meeting—technical talk
GNU Parallel
Coby Viner
Hoffman Lab
Thursday December 7, 2023
Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Overview
Why use GNU Parallel?
Basic examples from the tutorial
Basic elements of syntax [from the tutorial]
Much more syntax for many other tasks
Selected recent features
More tutorial examples
More tutorial examples
More tutorial examples
More tutorial examples
More tutorial examples
Some examples of my GNU parallel usage
Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Why use GNU Parallel?
a shell tool for executing jobs in parallel using one or more com-
puters.
Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Why use GNU Parallel?
a shell tool for executing jobs in parallel using one or more com-
puters.
I Easily parallelize perfectly parallel tasks
Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Why use GNU Parallel?
a shell tool for executing jobs in parallel using one or more com-
puters.
I Easily parallelize perfectly parallel tasks
I For each chromosome…
Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Why use GNU Parallel?
a shell tool for executing jobs in parallel using one or more com-
puters.
I Easily parallelize perfectly parallel tasks
I For each chromosome…
I For each sex, for each technical replicate, for each hyper-parameter(s)
Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Why use GNU Parallel?
a shell tool for executing jobs in parallel using one or more com-
puters.
I Easily parallelize perfectly parallel tasks
I For each chromosome…
I For each sex, for each technical replicate, for each hyper-parameter(s)
I Job submission scripts within a for loop
Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Why use GNU Parallel?
a shell tool for executing jobs in parallel using one or more com-
puters.
I Easily parallelize perfectly parallel tasks
I For each chromosome…
I For each sex, for each technical replicate, for each hyper-parameter(s)
I Job submission scripts within a for loop
I Improved, cleaner, syntax (for the programmer), even in serial
Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Why use GNU Parallel?
a shell tool for executing jobs in parallel using one or more com-
puters.
I Easily parallelize perfectly parallel tasks
I For each chromosome…
I For each sex, for each technical replicate, for each hyper-parameter(s)
I Job submission scripts within a for loop
I Improved, cleaner, syntax (for the programmer), even in serial
I Facile interleaving of tasks, in the order one is thinking about them
A basic [man page] example: “Working as xargs -n1.
Argument appending”
find . -name '*.html' | parallel gzip --best
A basic [man page] example: “Working as xargs -n1.
Argument appending”
find . -name '*.html' | parallel gzip --best
find . -type f -print0 | 
parallel -q0 perl -i -pe 's/FOO BAR/FUBAR/g'
Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Easy installation from source
Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Easy installation from source
Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Easy installation from source
Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Easy installation from source
Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Easy installation from source
Another basic [man page] example: “Inserting multiple
arguments”
bash: /bin/mv: Argument list too long
ls | grep -E '.log$' | parallel mv {} destdir
Another basic [man page] example: “Inserting multiple
arguments”
bash: /bin/mv: Argument list too long
ls | grep -E '.log$' | parallel mv {} destdir
ls | grep -E '.log$' | parallel -m mv {} destdir
Basic elements of syntax [from the tutorial]
Input:
parallel echo ::: A B C # command line
cat abc-file | parallel echo # from STDIN
parallel -a abc-file echo # from a file
Basic elements of syntax [from the tutorial]
Input:
parallel echo ::: A B C # command line
cat abc-file | parallel echo # from STDIN
parallel -a abc-file echo # from a file
Output [line order may vary]:
A
B
C
Basic elements of syntax [from the tutorial]
Multiple inputs.
Input:
parallel echo ::: A B C ::: D E F
cat abc-file | parallel -a - -a def-file echo
parallel -a abc-file -a def-file echo
cat abc-file | parallel echo :::: - def-file # alt. file
parallel echo ::: A B C :::: def-file # mix cmd. and file
Basic elements of syntax [from the tutorial]
Multiple inputs.
Input:
parallel echo ::: A B C ::: D E F
cat abc-file | parallel -a - -a def-file echo
parallel -a abc-file -a def-file echo
cat abc-file | parallel echo :::: - def-file # alt. file
parallel echo ::: A B C :::: def-file # mix cmd. and file
Output [line order may vary]:
A D
A E
A F
B D
B E
B F
C D
C E
C F
Basic elements of syntax [from the tutorial]
Matching input.
Input:
parallel --xapply echo ::: A B C ::: D E F
Basic elements of syntax [from the tutorial]
Matching input.
Input:
parallel --xapply echo ::: A B C ::: D E F
Output [line order may vary]:
A D
B E
C F
Basic elements of syntax [from the tutorial]
Matching input.
Input:
parallel --xapply echo ::: A B C ::: D E F
Output [line order may vary]:
A D
B E
C F
I –xapply will wrap, if insufficient input is provided.
Basic elements of syntax [from the tutorial]
Replacement strings: The 7 predefined replacement strings
Input:
parallel echo {} ::: A/B.C
parallel echo {.} ::: A/B.C
Output:
A/B.C
A/B
Basic elements of syntax [from the tutorial]
Replacement strings: The 7 predefined replacement strings
Input:
parallel echo {} ::: A/B.C
parallel echo {.} ::: A/B.C
Output:
A/B.C
A/B
Rep. String Result
. remove ext.
/ remove path
// only path
/. only ext. and path
# job number
% job slot number
Basic elements of syntax [from the tutorial]
Customizing replacement strings
--extensionreplace to change {.} etc.
Shorthand custom (PCRE+) replacement strings
GNU parallel’s 7 replacement strings:
--rpl '{} '
--rpl '{#} $_=$job->seq()'
--rpl '{%} $_=$job->slot()'
--rpl '{/} s:.*/::'
--rpl '{//} $Global::use{”File::Basename”} 
||= eval ”use File::Basename; 1;”; $_ = dirname($_);'
--rpl '{/.} s:.*/::; s:.[^/.]+$::;'
--rpl '{.} s:.[^/.]+$::'
Basic elements of syntax [from the tutorial]
Multiple input sources and positional replacement:
parallel echo {1} and {2} ::: A B ::: C D
Basic elements of syntax [from the tutorial]
Multiple input sources and positional replacement:
parallel echo {1} and {2} ::: A B ::: C D
I Always try to define replacements, with {<>} syntax.
Basic elements of syntax [from the tutorial]
Multiple input sources and positional replacement:
parallel echo {1} and {2} ::: A B ::: C D
I Always try to define replacements, with {<>} syntax.
I Test with --dry-run first.
Basic elements of syntax [from the tutorial]
More replacement strings
--plus adds the replacement strings
{+/} {+.} {+..} {+...} {..} {...} {/..} {/...} {##}.
{+foo} matches the opposite of {foo}:
{} =
{+/}/{/} =
{.}.{+.} =
{+/}/{/.}.{+.} =
{..}.{+..} =
{+/}/{/..}.{+..} =
{...}.{+...} =
{+/}/{/...}.{+...}
Basic elements of syntax [from the tutorial]
--plus also adds:
I Since May 2021: now includes {%%regexp} and {##regexp}.
Basic elements of syntax [from the tutorial]
--plus also adds:
I Since May 2021: now includes {%%regexp} and {##regexp}.
I Since Dec. 2020, {hgrp} that gives the intersection of the hostgroups of
the job and the sshlogin that the job is run on.
Basic elements of syntax [from the tutorial]
--plus also adds:
I Since May 2021: now includes {%%regexp} and {##regexp}.
I Since Dec. 2020, {hgrp} that gives the intersection of the hostgroups of
the job and the sshlogin that the job is run on.
I Since May 2020: also activates the replacement strings
{slot} = $PARALLEL_JOBSLOT, {sshlogin} = $PARALLEL_SSHLOGIN, {host}.
Lab meeting—
technical talk
Coby Viner
Use cases
Basic examples
Basic syntax
Additional
syntax
Recent features
More examples
More examples
More examples
More examples
More examples
Real examples
Performance over time
20100424
20100615
20100620
20100822
20100922
20101115
20101202
20110122
20110205
20110422
20110622
20110822
20111122
20120122
20120322
20120522
20120722
20121022
20121222
20130222
20130522
20130722
20130922
20131122
20140122
20140322
20140522
20140722
20140922
20141122
20150222
20150422
20150622
20150822
20151022
20151222
20160222
20160422
20160622
20160822
20161022
20161222
20170222
20170422
20170622
20170822
20171022
20171222
20180222
20180422
20180622
20180822
20181022
20181222
20190222
20190422
20190622
20190822
20191022
20191222
20200222
20200422
20200622
20200822
20201022
5
6
7
8
9
10
11
12
GNU Parallel overhead for different versions
3000 trials each running 1000 jobs
Command
milliseconds/job
Much more syntax for many other tasks
I --pipe: instead of STDIN as command args, data sent to STDIN of
command
Much more syntax for many other tasks
I --pipe: instead of STDIN as command args, data sent to STDIN of
command
I command_A | command_B | command_C, where command_B is slow
Much more syntax for many other tasks
I --pipe: instead of STDIN as command args, data sent to STDIN of
command
I command_A | command_B | command_C, where command_B is slow
I Remote execution to directly parallelize over multiple machines
Much more syntax for many other tasks
I --pipe: instead of STDIN as command args, data sent to STDIN of
command
I command_A | command_B | command_C, where command_B is slow
I Remote execution to directly parallelize over multiple machines
I Working directly with a SQL database
Much more syntax for many other tasks
I --pipe: instead of STDIN as command args, data sent to STDIN of
command
I command_A | command_B | command_C, where command_B is slow
I Remote execution to directly parallelize over multiple machines
I Working directly with a SQL database
I Shebang: often cat input_file | parallel command, but can do
#!/usr/bin/parallel --shebang -r echo
Much more syntax for many other tasks
I --pipe: instead of STDIN as command args, data sent to STDIN of
command
I command_A | command_B | command_C, where command_B is slow
I Remote execution to directly parallelize over multiple machines
I Working directly with a SQL database
I Shebang: often cat input_file | parallel command, but can do
#!/usr/bin/parallel --shebang -r echo
I As a counting semaphore: parallel --semaphore or sem
Much more syntax for many other tasks
I --pipe: instead of STDIN as command args, data sent to STDIN of
command
I command_A | command_B | command_C, where command_B is slow
I Remote execution to directly parallelize over multiple machines
I Working directly with a SQL database
I Shebang: often cat input_file | parallel command, but can do
#!/usr/bin/parallel --shebang -r echo
I As a counting semaphore: parallel --semaphore or sem
I Default is one slot: a mutex
Selected recent features (post-2020)
I --latest-line shows only the latest line of running jobs.
Selected recent features (post-2020)
I --latest-line shows only the latest line of running jobs.
I --color colors output in different colors per job (and additional related
features).
Selected recent features (post-2020)
I --latest-line shows only the latest line of running jobs.
I --color colors output in different colors per job (and additional related
features).
I --sshlogin: now quite fully-featured
Selected recent features (post-2020)
I --latest-line shows only the latest line of running jobs.
I --color colors output in different colors per job (and additional related
features).
I --sshlogin: now quite fully-featured
I --delay 123auto will auto-adjust --delay. If jobs fail due to being
spawned too quickly, --delay will exponentially increase.
Selected recent features (post-2020)
I --latest-line shows only the latest line of running jobs.
I --color colors output in different colors per job (and additional related
features).
I --sshlogin: now quite fully-featured
I --delay 123auto will auto-adjust --delay. If jobs fail due to being
spawned too quickly, --delay will exponentially increase.
I --memsuspend
Selected recent features (post-2020)
I --latest-line shows only the latest line of running jobs.
I --color colors output in different colors per job (and additional related
features).
I --sshlogin: now quite fully-featured
I --delay 123auto will auto-adjust --delay. If jobs fail due to being
spawned too quickly, --delay will exponentially increase.
I --memsuspend
I {= =}: includes yyyy_mm_dd_hh_mm_ss(),
yyyy_mm_dd_hh_mm(), etc.
Selected recent features (post-2020)
I --latest-line shows only the latest line of running jobs.
I --color colors output in different colors per job (and additional related
features).
I --sshlogin: now quite fully-featured
I --delay 123auto will auto-adjust --delay. If jobs fail due to being
spawned too quickly, --delay will exponentially increase.
I --memsuspend
I {= =}: includes yyyy_mm_dd_hh_mm_ss(),
yyyy_mm_dd_hh_mm(), etc.
I --filter, e.g., {1} < {2}+1.
Selected recent features (post-2020)
I --latest-line shows only the latest line of running jobs.
I --color colors output in different colors per job (and additional related
features).
I --sshlogin: now quite fully-featured
I --delay 123auto will auto-adjust --delay. If jobs fail due to being
spawned too quickly, --delay will exponentially increase.
I --memsuspend
I {= =}: includes yyyy_mm_dd_hh_mm_ss(),
yyyy_mm_dd_hh_mm(), etc.
I --filter, e.g., {1} < {2}+1.
I --template <text file>, with replacement strings. Replaces the
replacement strings and saves it under a new filename.
Another [man page] example: “Aggregating content of files”
parallel --header : echo x{X}y{Y}z{Z} > 
x{X}y{Y}z{Z} 
::: X {1..5} ::: Y {01..10} ::: Z {1..5}
Another [man page] example: “Aggregating content of files”
parallel --header : echo x{X}y{Y}z{Z} > 
x{X}y{Y}z{Z} 
::: X {1..5} ::: Y {01..10} ::: Z {1..5}
parallel eval 'cat {=s/y01/y*/=} > 
{=s/y01//=}' ::: *y01*
This runs: cat x1y*z1 > x1z1, ∀x∀z
Another [man page] example: directly call SLURM
#!/bin/bash
#SBATCH --time 00:02:00
#SBATCH --ntasks=4
#SBATCH --job-name GnuParallelDemo
#SBATCH --output gnuparallel.out
module purge
module load gnu_parallel
my_parallel=”parallel --delay .2 -j $SLURM_NTASKS”
my_srun=”srun --export=all --exclusive -n1”
my_srun=”$my_srun --cpus-per-task=1 --cpu-bind=cores”
$my_parallel ”$my_srun” echo This is job {} ::: {1..20}
Another [man page] example: myprog on FASTA input
cat file.fasta |
parallel --pipe -N1 --recstart '>' --rrs 
'read a; echo Name: ”$a”; myprog $(tr -d ”n”)'
Another [man page] example: fastq-reader on interleaved
FASTQ input
parallel --pipe-part -a big.fq --block -1 --regexp 
--recend 'n' --recstart '@.*(/1| 1:.*)n[A-Za-zn.~]' 
fastq-reader
Another [man page] example: simple scheduler
true >jobqueue;
while true; do
tail -n+0 -f jobqueue |
(parallel -E StOpHeRe -S ..; echo GNU Parallel is now done;
perl -e 'while(<>){/StOpHeRe/ and last};print <>' jobqueue > j2;
(seq 1000 » jobqueue &);
echo Done appending dummy data forcing tail to exit)
mv j2 jobqueue
done
Another [man page] example: simple scheduler
true >jobqueue;
while true; do
tail -n+0 -f jobqueue |
(parallel -E StOpHeRe -S ..; echo GNU Parallel is now done;
perl -e 'while(<>){/StOpHeRe/ and last};print <>' jobqueue > j2;
(seq 1000 » jobqueue &);
echo Done appending dummy data forcing tail to exit)
mv j2 jobqueue
done
# Day time
echo 50% > jobfile
cp day_server_list ~/.parallel/sshloginfile
# Night time
echo 100% > jobfile
cp night_server_list ~/.parallel/sshloginfile
Post-meme2images inkscape conversions for publication-ready
CentriMo plots and sequence logos
parallel inkscape --vacuum-defs --export-pdf={.}.pdf {} 
::: ”$centrimo_eps_1” ”$centrimo_eps_2”
Post-meme2images inkscape conversions for publication-ready
CentriMo plots and sequence logos
parallel inkscape --vacuum-defs --export-pdf={.}.pdf {} 
::: ”$centrimo_eps_1” ”$centrimo_eps_2”
parallel ”inkscape --vacuum-defs --export-pdf={.}.pdf {};
pdfcrop --hires --clip --margins '0 0 0 -12' {.}.pdf;
mv -f {.}-crop.pdf {.}.pdf
” ::: logo+([:digit:])$VECTOR_FILE_EXT
Fixing directory structures—symbolic link issues (for data
provenance)
parallel --dry-run -j 1 --rpl 
'{s} s@.*?((?:fe)?male_d+-d+).*@$1@' ”{s}; 
ln -s /$(readlink {}) {}” 
::: $(find . -mindepth 3 -maxdepth 3 -xtype l)
Fixing directory structures—symbolic link issues (for data
provenance)
parallel --dry-run -j 1 --rpl 
'{s} s@.*?((?:fe)?male_d+-d+).*@$1@' ”{s}; 
ln -s /$(readlink {}) {}” 
::: $(find . -mindepth 3 -maxdepth 3 -xtype l)
parallel --rpl 
'{s} s:.+?/(.+?)_peaks.narrowPeak.gz$:
1_summits.bed.gz:' 
”ln -s ../../linked-2015-10-07-.../data/MACS/{s}
{//}/” ::: */*_peaks.narrowPeak.gz
Fixing directory structures—symbolic link issues (for data
provenance)
parallel --dry-run -j 1 --rpl 
'{s} s@.*?((?:fe)?male_d+-d+).*@$1@' ”{s}; 
ln -s /$(readlink {}) {}” 
::: $(find . -mindepth 3 -maxdepth 3 -xtype l)
parallel --rpl 
'{s} s:.+?/(.+?)_peaks.narrowPeak.gz$:
1_summits.bed.gz:' 
”ln -s ../../linked-2015-10-07-.../data/MACS/{s}
{//}/” ::: */*_peaks.narrowPeak.gz
parallel -j 1 --rpl '{...} s:/.*::;' 
”dir=$(readlink -f {} | 
sed -r 's:/linked.+?/:/{...}/:'); 
mkdir $dir; rm -f {}; ln -s $dir {//}/M-ChIP_runs” 
::: $(find linked-2016-01-31-* -type l -name 'M-ChIP_runs')
Exploring/collating complex CentriMo results
parallel --dry-run -j 1 
--rpl '{sex} s:.*?(w*male)-d+_d+.*:1:' 
--rpl '{rep} s:.*?male-(d+_d+).*:1:' 
--rpl ”{TFinfo} s:.*?([^/]+)-expandedTo500bpRegions-
mod.*:1:” 
--rpl '{thresh} s:.*?d+_d+-(0.d+).*:1:' 
”awk '$0 !~ /^#/ {$1=””; $2=””; 
print ”{TFinfo}”,”{sex}”,”{rep}”,”
{thresh}”$0;}' {} | 
sed -r 's/[[:space:]]+/t/g'” 
::: $(find ../MEME-ChIP_runs-initial_controls/ 
-mindepth 5 -wholename 
'*hypothesis_testing_selected_controlledVars/
centrimo_out/centrimo.txt' | head
)
Processing ChIP-seq peak data with MACS
parallel --rpl '{/..SRF} s:../w+[-.](SRFw*).*:$1:i;' 
macs14 callpeak -t {} -n {/..SRF} -g 'mm' 
-s 51 --bw 150 -S -p 0.0001 
::: ../*.alignment.mm8.bed.gz
Processing ChIP-seq peak data with MACS
parallel --rpl '{/..SRF} s:../w+[-.](SRFw*).*:$1:i;' 
macs14 callpeak -t {} -n {/..SRF} -g 'mm' 
-s 51 --bw 150 -S -p 0.0001 
::: ../*.alignment.mm8.bed.gz
parallel ”zcat {} | awk 'BEGIN{FS=OFS=”t”} NR > 1 
{print $2,$3,$4;}' | 
pigz -9 > {/.}.bed.gz” ::: ../*MACS_peaks_annot.txt.gz
liftoverAll '.bed.gz'
Processing ChIP-seq peak data with MACS
function liftoverAll {
parallel liftOver {} ”$LIFTOVER_CHAIN_FILE_FULL_PATH” 
../$LIFTED_OVER_DIR_NAME/{/.}.liftedmm9 
../$LIFTED_OVER_DIR_NAME/{/.}.unlifted 
::: *”$1”
pigz -9 ../$LIFTED_OVER_DIR_NAME/*.liftedmm9 
../$LIFTED_OVER_DIR_NAME/*.unlifted
}
Processing ChIP-seq peak data with MACS
function liftoverAll {
parallel liftOver {} ”$LIFTOVER_CHAIN_FILE_FULL_PATH” 
../$LIFTED_OVER_DIR_NAME/{/.}.liftedmm9 
../$LIFTED_OVER_DIR_NAME/{/.}.unlifted 
::: *”$1”
pigz -9 ../$LIFTED_OVER_DIR_NAME/*.liftedmm9 
../$LIFTED_OVER_DIR_NAME/*.unlifted
}
parallel -j ${NSLOTS:=1} --xapply 
--rpl '{r} s:.*RS(d+).*:1:' 
”$MACS_CMD_AND_COMMON_PARAMS -f BAMPE -n 'M-r{1r}' 
-t {1} -c {2} |& tee -a '$OUT_DIR/M-r{1r}.log'” 
::: $IN_DIR/1494*@(1|2|3).bam 
::: $IN_DIR/1494*@(4|5|6).bam
Pipeline—processing bisulfite sequencing data with Methpipe
merge_methcount_cmds=$(
parallel -j $NSLOTS --joblog ”x.log” 
--rpl '{-../} s:.*/::; s:(.[^.]+)+$::; s:-d+$::;' 
--dry-run 
”echo ”$MODULE_LOAD_CMD export LC_ALL=C;
cat $ALIGNED_DIR/{-../}*.tomr | 
sort -k 1,1 -k 2,2n -k 3,3n -k 6,6 | ldots | 
methcounts -v -c $BISMARK_REF 
-o $COUNTS_DIR/{-../}_pool_ALL.meth /dev/stdin” 
| tee -a /dev/stderr | qsub ldots 
::: $IN_DIR/*.1.fastq.gz | sort -V | uniq
)
Run BedTools Coverage on many files
parallel -j $OMP_NUM_THREADS 
--delay 1 --lb --resume --resume-failed --joblog 
”$BASE_DATA_DIR/$(basename $0 .sh)-${output_job_file_suffix#-}.log” 
--plus --rpl '{acc} s:.*(SRRw+).*:$1:' --tag --tagstring '{1acc}' ”
... [expanded below]
” ::: $BASE_PEAK_DIR/*/peaks.bed* ::: '-hist' '-d' :::+ 'hist' 'pos'
Run BedTools Coverage on many files
parallel -j $OMP_NUM_THREADS 
--delay 1 --lb --resume --resume-failed --joblog 
”$BASE_DATA_DIR/$(basename $0 .sh)-${output_job_file_suffix#-}.log” 
--plus --rpl '{acc} s:.*(SRRw+).*:$1:' --tag --tagstring '{1acc}' ”
... [expanded below]
” ::: $BASE_PEAK_DIR/*/peaks.bed* ::: '-hist' '-d' :::+ 'hist' 'pos'
temp_BED=”$(mktemp).bed”
bedtools slop -g $assembly -i ”{1}” -b $SLOP_LEN > $temp_BED
cat -- ”$file” | bedtools coverage $BAM_IN_PARAM stdin $BED_IN_PARAM 
$temp_BED {2} -iobuf $READ_BUF_SIZE
...
Run BedTools Coverage on many files
if [[ ! -s $”$output_basename-reverse$output_ext” ]]; then
parallel --env _ -j $(($OMP_NUM_THREADS>1 ? 2 : 1)) --lb -I@@ 
--rpl '@name@ s:+:forward:; s:-:reverse:' ”
$MODULE_LOAD_CMD
strand_spec_output_file=”$output_basename-@name@$output_ext”
sambamba view -t $(($OMP_NUM_THREADS>8 ? 8 : 1)) -f bam -l 0 -F ”strand
$BAM_input_file $CHR_SUBSET | run_bedtools_cov_cmd 'stdin' > $stra
if [[ @@ == '-' ]]; then
sed -i 's/+/-/' $strand_spec_output_file
fi
pigz -9 -p $(($OMP_NUM_THREADS>8 ? 4 : 1)) $strand_spec_output_file
” ::: '+' '-'

GNU Parallel: Lab meeting—technical talk

  • 1.
    Lab meeting— technical talk CobyViner Use cases Basic examples Basic syntax Additional syntax Recent features More examples More examples More examples More examples More examples Real examples Lab meeting—technical talk GNU Parallel Coby Viner Hoffman Lab Thursday December 7, 2023
  • 2.
    Lab meeting— technical talk CobyViner Use cases Basic examples Basic syntax Additional syntax Recent features More examples More examples More examples More examples More examples Real examples Overview Why use GNU Parallel? Basic examples from the tutorial Basic elements of syntax [from the tutorial] Much more syntax for many other tasks Selected recent features More tutorial examples More tutorial examples More tutorial examples More tutorial examples More tutorial examples Some examples of my GNU parallel usage
  • 3.
    Lab meeting— technical talk CobyViner Use cases Basic examples Basic syntax Additional syntax Recent features More examples More examples More examples More examples More examples Real examples Why use GNU Parallel? a shell tool for executing jobs in parallel using one or more com- puters.
  • 4.
    Lab meeting— technical talk CobyViner Use cases Basic examples Basic syntax Additional syntax Recent features More examples More examples More examples More examples More examples Real examples Why use GNU Parallel? a shell tool for executing jobs in parallel using one or more com- puters. I Easily parallelize perfectly parallel tasks
  • 5.
    Lab meeting— technical talk CobyViner Use cases Basic examples Basic syntax Additional syntax Recent features More examples More examples More examples More examples More examples Real examples Why use GNU Parallel? a shell tool for executing jobs in parallel using one or more com- puters. I Easily parallelize perfectly parallel tasks I For each chromosome…
  • 6.
    Lab meeting— technical talk CobyViner Use cases Basic examples Basic syntax Additional syntax Recent features More examples More examples More examples More examples More examples Real examples Why use GNU Parallel? a shell tool for executing jobs in parallel using one or more com- puters. I Easily parallelize perfectly parallel tasks I For each chromosome… I For each sex, for each technical replicate, for each hyper-parameter(s)
  • 7.
    Lab meeting— technical talk CobyViner Use cases Basic examples Basic syntax Additional syntax Recent features More examples More examples More examples More examples More examples Real examples Why use GNU Parallel? a shell tool for executing jobs in parallel using one or more com- puters. I Easily parallelize perfectly parallel tasks I For each chromosome… I For each sex, for each technical replicate, for each hyper-parameter(s) I Job submission scripts within a for loop
  • 8.
    Lab meeting— technical talk CobyViner Use cases Basic examples Basic syntax Additional syntax Recent features More examples More examples More examples More examples More examples Real examples Why use GNU Parallel? a shell tool for executing jobs in parallel using one or more com- puters. I Easily parallelize perfectly parallel tasks I For each chromosome… I For each sex, for each technical replicate, for each hyper-parameter(s) I Job submission scripts within a for loop I Improved, cleaner, syntax (for the programmer), even in serial
  • 9.
    Lab meeting— technical talk CobyViner Use cases Basic examples Basic syntax Additional syntax Recent features More examples More examples More examples More examples More examples Real examples Why use GNU Parallel? a shell tool for executing jobs in parallel using one or more com- puters. I Easily parallelize perfectly parallel tasks I For each chromosome… I For each sex, for each technical replicate, for each hyper-parameter(s) I Job submission scripts within a for loop I Improved, cleaner, syntax (for the programmer), even in serial I Facile interleaving of tasks, in the order one is thinking about them
  • 10.
    A basic [manpage] example: “Working as xargs -n1. Argument appending” find . -name '*.html' | parallel gzip --best
  • 11.
    A basic [manpage] example: “Working as xargs -n1. Argument appending” find . -name '*.html' | parallel gzip --best find . -type f -print0 | parallel -q0 perl -i -pe 's/FOO BAR/FUBAR/g'
  • 12.
    Lab meeting— technical talk CobyViner Use cases Basic examples Basic syntax Additional syntax Recent features More examples More examples More examples More examples More examples Real examples Easy installation from source
  • 13.
    Lab meeting— technical talk CobyViner Use cases Basic examples Basic syntax Additional syntax Recent features More examples More examples More examples More examples More examples Real examples Easy installation from source
  • 14.
    Lab meeting— technical talk CobyViner Use cases Basic examples Basic syntax Additional syntax Recent features More examples More examples More examples More examples More examples Real examples Easy installation from source
  • 15.
    Lab meeting— technical talk CobyViner Use cases Basic examples Basic syntax Additional syntax Recent features More examples More examples More examples More examples More examples Real examples Easy installation from source
  • 16.
    Lab meeting— technical talk CobyViner Use cases Basic examples Basic syntax Additional syntax Recent features More examples More examples More examples More examples More examples Real examples Easy installation from source
  • 17.
    Another basic [manpage] example: “Inserting multiple arguments” bash: /bin/mv: Argument list too long ls | grep -E '.log$' | parallel mv {} destdir
  • 18.
    Another basic [manpage] example: “Inserting multiple arguments” bash: /bin/mv: Argument list too long ls | grep -E '.log$' | parallel mv {} destdir ls | grep -E '.log$' | parallel -m mv {} destdir
  • 19.
    Basic elements ofsyntax [from the tutorial] Input: parallel echo ::: A B C # command line cat abc-file | parallel echo # from STDIN parallel -a abc-file echo # from a file
  • 20.
    Basic elements ofsyntax [from the tutorial] Input: parallel echo ::: A B C # command line cat abc-file | parallel echo # from STDIN parallel -a abc-file echo # from a file Output [line order may vary]: A B C
  • 21.
    Basic elements ofsyntax [from the tutorial] Multiple inputs. Input: parallel echo ::: A B C ::: D E F cat abc-file | parallel -a - -a def-file echo parallel -a abc-file -a def-file echo cat abc-file | parallel echo :::: - def-file # alt. file parallel echo ::: A B C :::: def-file # mix cmd. and file
  • 22.
    Basic elements ofsyntax [from the tutorial] Multiple inputs. Input: parallel echo ::: A B C ::: D E F cat abc-file | parallel -a - -a def-file echo parallel -a abc-file -a def-file echo cat abc-file | parallel echo :::: - def-file # alt. file parallel echo ::: A B C :::: def-file # mix cmd. and file Output [line order may vary]: A D A E A F B D B E B F C D C E C F
  • 23.
    Basic elements ofsyntax [from the tutorial] Matching input. Input: parallel --xapply echo ::: A B C ::: D E F
  • 24.
    Basic elements ofsyntax [from the tutorial] Matching input. Input: parallel --xapply echo ::: A B C ::: D E F Output [line order may vary]: A D B E C F
  • 25.
    Basic elements ofsyntax [from the tutorial] Matching input. Input: parallel --xapply echo ::: A B C ::: D E F Output [line order may vary]: A D B E C F I –xapply will wrap, if insufficient input is provided.
  • 26.
    Basic elements ofsyntax [from the tutorial] Replacement strings: The 7 predefined replacement strings Input: parallel echo {} ::: A/B.C parallel echo {.} ::: A/B.C Output: A/B.C A/B
  • 27.
    Basic elements ofsyntax [from the tutorial] Replacement strings: The 7 predefined replacement strings Input: parallel echo {} ::: A/B.C parallel echo {.} ::: A/B.C Output: A/B.C A/B Rep. String Result . remove ext. / remove path // only path /. only ext. and path # job number % job slot number
  • 28.
    Basic elements ofsyntax [from the tutorial] Customizing replacement strings --extensionreplace to change {.} etc. Shorthand custom (PCRE+) replacement strings GNU parallel’s 7 replacement strings: --rpl '{} ' --rpl '{#} $_=$job->seq()' --rpl '{%} $_=$job->slot()' --rpl '{/} s:.*/::' --rpl '{//} $Global::use{”File::Basename”} ||= eval ”use File::Basename; 1;”; $_ = dirname($_);' --rpl '{/.} s:.*/::; s:.[^/.]+$::;' --rpl '{.} s:.[^/.]+$::'
  • 29.
    Basic elements ofsyntax [from the tutorial] Multiple input sources and positional replacement: parallel echo {1} and {2} ::: A B ::: C D
  • 30.
    Basic elements ofsyntax [from the tutorial] Multiple input sources and positional replacement: parallel echo {1} and {2} ::: A B ::: C D I Always try to define replacements, with {<>} syntax.
  • 31.
    Basic elements ofsyntax [from the tutorial] Multiple input sources and positional replacement: parallel echo {1} and {2} ::: A B ::: C D I Always try to define replacements, with {<>} syntax. I Test with --dry-run first.
  • 32.
    Basic elements ofsyntax [from the tutorial] More replacement strings --plus adds the replacement strings {+/} {+.} {+..} {+...} {..} {...} {/..} {/...} {##}. {+foo} matches the opposite of {foo}: {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} = {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}
  • 33.
    Basic elements ofsyntax [from the tutorial] --plus also adds: I Since May 2021: now includes {%%regexp} and {##regexp}.
  • 34.
    Basic elements ofsyntax [from the tutorial] --plus also adds: I Since May 2021: now includes {%%regexp} and {##regexp}. I Since Dec. 2020, {hgrp} that gives the intersection of the hostgroups of the job and the sshlogin that the job is run on.
  • 35.
    Basic elements ofsyntax [from the tutorial] --plus also adds: I Since May 2021: now includes {%%regexp} and {##regexp}. I Since Dec. 2020, {hgrp} that gives the intersection of the hostgroups of the job and the sshlogin that the job is run on. I Since May 2020: also activates the replacement strings {slot} = $PARALLEL_JOBSLOT, {sshlogin} = $PARALLEL_SSHLOGIN, {host}.
  • 36.
    Lab meeting— technical talk CobyViner Use cases Basic examples Basic syntax Additional syntax Recent features More examples More examples More examples More examples More examples Real examples Performance over time 20100424 20100615 20100620 20100822 20100922 20101115 20101202 20110122 20110205 20110422 20110622 20110822 20111122 20120122 20120322 20120522 20120722 20121022 20121222 20130222 20130522 20130722 20130922 20131122 20140122 20140322 20140522 20140722 20140922 20141122 20150222 20150422 20150622 20150822 20151022 20151222 20160222 20160422 20160622 20160822 20161022 20161222 20170222 20170422 20170622 20170822 20171022 20171222 20180222 20180422 20180622 20180822 20181022 20181222 20190222 20190422 20190622 20190822 20191022 20191222 20200222 20200422 20200622 20200822 20201022 5 6 7 8 9 10 11 12 GNU Parallel overhead for different versions 3000 trials each running 1000 jobs Command milliseconds/job
  • 37.
    Much more syntaxfor many other tasks I --pipe: instead of STDIN as command args, data sent to STDIN of command
  • 38.
    Much more syntaxfor many other tasks I --pipe: instead of STDIN as command args, data sent to STDIN of command I command_A | command_B | command_C, where command_B is slow
  • 39.
    Much more syntaxfor many other tasks I --pipe: instead of STDIN as command args, data sent to STDIN of command I command_A | command_B | command_C, where command_B is slow I Remote execution to directly parallelize over multiple machines
  • 40.
    Much more syntaxfor many other tasks I --pipe: instead of STDIN as command args, data sent to STDIN of command I command_A | command_B | command_C, where command_B is slow I Remote execution to directly parallelize over multiple machines I Working directly with a SQL database
  • 41.
    Much more syntaxfor many other tasks I --pipe: instead of STDIN as command args, data sent to STDIN of command I command_A | command_B | command_C, where command_B is slow I Remote execution to directly parallelize over multiple machines I Working directly with a SQL database I Shebang: often cat input_file | parallel command, but can do #!/usr/bin/parallel --shebang -r echo
  • 42.
    Much more syntaxfor many other tasks I --pipe: instead of STDIN as command args, data sent to STDIN of command I command_A | command_B | command_C, where command_B is slow I Remote execution to directly parallelize over multiple machines I Working directly with a SQL database I Shebang: often cat input_file | parallel command, but can do #!/usr/bin/parallel --shebang -r echo I As a counting semaphore: parallel --semaphore or sem
  • 43.
    Much more syntaxfor many other tasks I --pipe: instead of STDIN as command args, data sent to STDIN of command I command_A | command_B | command_C, where command_B is slow I Remote execution to directly parallelize over multiple machines I Working directly with a SQL database I Shebang: often cat input_file | parallel command, but can do #!/usr/bin/parallel --shebang -r echo I As a counting semaphore: parallel --semaphore or sem I Default is one slot: a mutex
  • 44.
    Selected recent features(post-2020) I --latest-line shows only the latest line of running jobs.
  • 45.
    Selected recent features(post-2020) I --latest-line shows only the latest line of running jobs. I --color colors output in different colors per job (and additional related features).
  • 46.
    Selected recent features(post-2020) I --latest-line shows only the latest line of running jobs. I --color colors output in different colors per job (and additional related features). I --sshlogin: now quite fully-featured
  • 47.
    Selected recent features(post-2020) I --latest-line shows only the latest line of running jobs. I --color colors output in different colors per job (and additional related features). I --sshlogin: now quite fully-featured I --delay 123auto will auto-adjust --delay. If jobs fail due to being spawned too quickly, --delay will exponentially increase.
  • 48.
    Selected recent features(post-2020) I --latest-line shows only the latest line of running jobs. I --color colors output in different colors per job (and additional related features). I --sshlogin: now quite fully-featured I --delay 123auto will auto-adjust --delay. If jobs fail due to being spawned too quickly, --delay will exponentially increase. I --memsuspend
  • 49.
    Selected recent features(post-2020) I --latest-line shows only the latest line of running jobs. I --color colors output in different colors per job (and additional related features). I --sshlogin: now quite fully-featured I --delay 123auto will auto-adjust --delay. If jobs fail due to being spawned too quickly, --delay will exponentially increase. I --memsuspend I {= =}: includes yyyy_mm_dd_hh_mm_ss(), yyyy_mm_dd_hh_mm(), etc.
  • 50.
    Selected recent features(post-2020) I --latest-line shows only the latest line of running jobs. I --color colors output in different colors per job (and additional related features). I --sshlogin: now quite fully-featured I --delay 123auto will auto-adjust --delay. If jobs fail due to being spawned too quickly, --delay will exponentially increase. I --memsuspend I {= =}: includes yyyy_mm_dd_hh_mm_ss(), yyyy_mm_dd_hh_mm(), etc. I --filter, e.g., {1} < {2}+1.
  • 51.
    Selected recent features(post-2020) I --latest-line shows only the latest line of running jobs. I --color colors output in different colors per job (and additional related features). I --sshlogin: now quite fully-featured I --delay 123auto will auto-adjust --delay. If jobs fail due to being spawned too quickly, --delay will exponentially increase. I --memsuspend I {= =}: includes yyyy_mm_dd_hh_mm_ss(), yyyy_mm_dd_hh_mm(), etc. I --filter, e.g., {1} < {2}+1. I --template <text file>, with replacement strings. Replaces the replacement strings and saves it under a new filename.
  • 52.
    Another [man page]example: “Aggregating content of files” parallel --header : echo x{X}y{Y}z{Z} > x{X}y{Y}z{Z} ::: X {1..5} ::: Y {01..10} ::: Z {1..5}
  • 53.
    Another [man page]example: “Aggregating content of files” parallel --header : echo x{X}y{Y}z{Z} > x{X}y{Y}z{Z} ::: X {1..5} ::: Y {01..10} ::: Z {1..5} parallel eval 'cat {=s/y01/y*/=} > {=s/y01//=}' ::: *y01* This runs: cat x1y*z1 > x1z1, ∀x∀z
  • 54.
    Another [man page]example: directly call SLURM #!/bin/bash #SBATCH --time 00:02:00 #SBATCH --ntasks=4 #SBATCH --job-name GnuParallelDemo #SBATCH --output gnuparallel.out module purge module load gnu_parallel my_parallel=”parallel --delay .2 -j $SLURM_NTASKS” my_srun=”srun --export=all --exclusive -n1” my_srun=”$my_srun --cpus-per-task=1 --cpu-bind=cores” $my_parallel ”$my_srun” echo This is job {} ::: {1..20}
  • 55.
    Another [man page]example: myprog on FASTA input cat file.fasta | parallel --pipe -N1 --recstart '>' --rrs 'read a; echo Name: ”$a”; myprog $(tr -d ”n”)'
  • 56.
    Another [man page]example: fastq-reader on interleaved FASTQ input parallel --pipe-part -a big.fq --block -1 --regexp --recend 'n' --recstart '@.*(/1| 1:.*)n[A-Za-zn.~]' fastq-reader
  • 57.
    Another [man page]example: simple scheduler true >jobqueue; while true; do tail -n+0 -f jobqueue | (parallel -E StOpHeRe -S ..; echo GNU Parallel is now done; perl -e 'while(<>){/StOpHeRe/ and last};print <>' jobqueue > j2; (seq 1000 » jobqueue &); echo Done appending dummy data forcing tail to exit) mv j2 jobqueue done
  • 58.
    Another [man page]example: simple scheduler true >jobqueue; while true; do tail -n+0 -f jobqueue | (parallel -E StOpHeRe -S ..; echo GNU Parallel is now done; perl -e 'while(<>){/StOpHeRe/ and last};print <>' jobqueue > j2; (seq 1000 » jobqueue &); echo Done appending dummy data forcing tail to exit) mv j2 jobqueue done # Day time echo 50% > jobfile cp day_server_list ~/.parallel/sshloginfile # Night time echo 100% > jobfile cp night_server_list ~/.parallel/sshloginfile
  • 59.
    Post-meme2images inkscape conversionsfor publication-ready CentriMo plots and sequence logos parallel inkscape --vacuum-defs --export-pdf={.}.pdf {} ::: ”$centrimo_eps_1” ”$centrimo_eps_2”
  • 60.
    Post-meme2images inkscape conversionsfor publication-ready CentriMo plots and sequence logos parallel inkscape --vacuum-defs --export-pdf={.}.pdf {} ::: ”$centrimo_eps_1” ”$centrimo_eps_2” parallel ”inkscape --vacuum-defs --export-pdf={.}.pdf {}; pdfcrop --hires --clip --margins '0 0 0 -12' {.}.pdf; mv -f {.}-crop.pdf {.}.pdf ” ::: logo+([:digit:])$VECTOR_FILE_EXT
  • 61.
    Fixing directory structures—symboliclink issues (for data provenance) parallel --dry-run -j 1 --rpl '{s} s@.*?((?:fe)?male_d+-d+).*@$1@' ”{s}; ln -s /$(readlink {}) {}” ::: $(find . -mindepth 3 -maxdepth 3 -xtype l)
  • 62.
    Fixing directory structures—symboliclink issues (for data provenance) parallel --dry-run -j 1 --rpl '{s} s@.*?((?:fe)?male_d+-d+).*@$1@' ”{s}; ln -s /$(readlink {}) {}” ::: $(find . -mindepth 3 -maxdepth 3 -xtype l) parallel --rpl '{s} s:.+?/(.+?)_peaks.narrowPeak.gz$: 1_summits.bed.gz:' ”ln -s ../../linked-2015-10-07-.../data/MACS/{s} {//}/” ::: */*_peaks.narrowPeak.gz
  • 63.
    Fixing directory structures—symboliclink issues (for data provenance) parallel --dry-run -j 1 --rpl '{s} s@.*?((?:fe)?male_d+-d+).*@$1@' ”{s}; ln -s /$(readlink {}) {}” ::: $(find . -mindepth 3 -maxdepth 3 -xtype l) parallel --rpl '{s} s:.+?/(.+?)_peaks.narrowPeak.gz$: 1_summits.bed.gz:' ”ln -s ../../linked-2015-10-07-.../data/MACS/{s} {//}/” ::: */*_peaks.narrowPeak.gz parallel -j 1 --rpl '{...} s:/.*::;' ”dir=$(readlink -f {} | sed -r 's:/linked.+?/:/{...}/:'); mkdir $dir; rm -f {}; ln -s $dir {//}/M-ChIP_runs” ::: $(find linked-2016-01-31-* -type l -name 'M-ChIP_runs')
  • 64.
    Exploring/collating complex CentriMoresults parallel --dry-run -j 1 --rpl '{sex} s:.*?(w*male)-d+_d+.*:1:' --rpl '{rep} s:.*?male-(d+_d+).*:1:' --rpl ”{TFinfo} s:.*?([^/]+)-expandedTo500bpRegions- mod.*:1:” --rpl '{thresh} s:.*?d+_d+-(0.d+).*:1:' ”awk '$0 !~ /^#/ {$1=””; $2=””; print ”{TFinfo}”,”{sex}”,”{rep}”,” {thresh}”$0;}' {} | sed -r 's/[[:space:]]+/t/g'” ::: $(find ../MEME-ChIP_runs-initial_controls/ -mindepth 5 -wholename '*hypothesis_testing_selected_controlledVars/ centrimo_out/centrimo.txt' | head )
  • 65.
    Processing ChIP-seq peakdata with MACS parallel --rpl '{/..SRF} s:../w+[-.](SRFw*).*:$1:i;' macs14 callpeak -t {} -n {/..SRF} -g 'mm' -s 51 --bw 150 -S -p 0.0001 ::: ../*.alignment.mm8.bed.gz
  • 66.
    Processing ChIP-seq peakdata with MACS parallel --rpl '{/..SRF} s:../w+[-.](SRFw*).*:$1:i;' macs14 callpeak -t {} -n {/..SRF} -g 'mm' -s 51 --bw 150 -S -p 0.0001 ::: ../*.alignment.mm8.bed.gz parallel ”zcat {} | awk 'BEGIN{FS=OFS=”t”} NR > 1 {print $2,$3,$4;}' | pigz -9 > {/.}.bed.gz” ::: ../*MACS_peaks_annot.txt.gz liftoverAll '.bed.gz'
  • 67.
    Processing ChIP-seq peakdata with MACS function liftoverAll { parallel liftOver {} ”$LIFTOVER_CHAIN_FILE_FULL_PATH” ../$LIFTED_OVER_DIR_NAME/{/.}.liftedmm9 ../$LIFTED_OVER_DIR_NAME/{/.}.unlifted ::: *”$1” pigz -9 ../$LIFTED_OVER_DIR_NAME/*.liftedmm9 ../$LIFTED_OVER_DIR_NAME/*.unlifted }
  • 68.
    Processing ChIP-seq peakdata with MACS function liftoverAll { parallel liftOver {} ”$LIFTOVER_CHAIN_FILE_FULL_PATH” ../$LIFTED_OVER_DIR_NAME/{/.}.liftedmm9 ../$LIFTED_OVER_DIR_NAME/{/.}.unlifted ::: *”$1” pigz -9 ../$LIFTED_OVER_DIR_NAME/*.liftedmm9 ../$LIFTED_OVER_DIR_NAME/*.unlifted } parallel -j ${NSLOTS:=1} --xapply --rpl '{r} s:.*RS(d+).*:1:' ”$MACS_CMD_AND_COMMON_PARAMS -f BAMPE -n 'M-r{1r}' -t {1} -c {2} |& tee -a '$OUT_DIR/M-r{1r}.log'” ::: $IN_DIR/1494*@(1|2|3).bam ::: $IN_DIR/1494*@(4|5|6).bam
  • 69.
    Pipeline—processing bisulfite sequencingdata with Methpipe merge_methcount_cmds=$( parallel -j $NSLOTS --joblog ”x.log” --rpl '{-../} s:.*/::; s:(.[^.]+)+$::; s:-d+$::;' --dry-run ”echo ”$MODULE_LOAD_CMD export LC_ALL=C; cat $ALIGNED_DIR/{-../}*.tomr | sort -k 1,1 -k 2,2n -k 3,3n -k 6,6 | ldots | methcounts -v -c $BISMARK_REF -o $COUNTS_DIR/{-../}_pool_ALL.meth /dev/stdin” | tee -a /dev/stderr | qsub ldots ::: $IN_DIR/*.1.fastq.gz | sort -V | uniq )
  • 70.
    Run BedTools Coverageon many files parallel -j $OMP_NUM_THREADS --delay 1 --lb --resume --resume-failed --joblog ”$BASE_DATA_DIR/$(basename $0 .sh)-${output_job_file_suffix#-}.log” --plus --rpl '{acc} s:.*(SRRw+).*:$1:' --tag --tagstring '{1acc}' ” ... [expanded below] ” ::: $BASE_PEAK_DIR/*/peaks.bed* ::: '-hist' '-d' :::+ 'hist' 'pos'
  • 71.
    Run BedTools Coverageon many files parallel -j $OMP_NUM_THREADS --delay 1 --lb --resume --resume-failed --joblog ”$BASE_DATA_DIR/$(basename $0 .sh)-${output_job_file_suffix#-}.log” --plus --rpl '{acc} s:.*(SRRw+).*:$1:' --tag --tagstring '{1acc}' ” ... [expanded below] ” ::: $BASE_PEAK_DIR/*/peaks.bed* ::: '-hist' '-d' :::+ 'hist' 'pos' temp_BED=”$(mktemp).bed” bedtools slop -g $assembly -i ”{1}” -b $SLOP_LEN > $temp_BED cat -- ”$file” | bedtools coverage $BAM_IN_PARAM stdin $BED_IN_PARAM $temp_BED {2} -iobuf $READ_BUF_SIZE ...
  • 72.
    Run BedTools Coverageon many files if [[ ! -s $”$output_basename-reverse$output_ext” ]]; then parallel --env _ -j $(($OMP_NUM_THREADS>1 ? 2 : 1)) --lb -I@@ --rpl '@name@ s:+:forward:; s:-:reverse:' ” $MODULE_LOAD_CMD strand_spec_output_file=”$output_basename-@name@$output_ext” sambamba view -t $(($OMP_NUM_THREADS>8 ? 8 : 1)) -f bam -l 0 -F ”strand $BAM_input_file $CHR_SUBSET | run_bedtools_cov_cmd 'stdin' > $stra if [[ @@ == '-' ]]; then sed -i 's/+/-/' $strand_spec_output_file fi pigz -9 -p $(($OMP_NUM_THREADS>8 ? 4 : 1)) $strand_spec_output_file ” ::: '+' '-'