Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

GNU Parallel

257 views

Published on

Hoffman Lab Tech Talk

Published in: Technology
  • What Men Secretly Want - The Missing "Secret Ingredient" To Committed Love That Never Fades ❤❤❤ https://dwz1.cc/bz4fCHVR
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

GNU Parallel

  1. 1. LAB MEETING— TECHNICAL TALK COBY VINER USE CASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES LAB MEETING—TECHNICAL TALK GNU PARALLEL O. TANGE, “GNU PARALLEL - THE COMMAND-LINE POWER TOOL”, ;login: The USENIX Magazine, VOL. 36, NO. 1, PP. 42–47, FEB. 2011 Coby Viner Hoffman Lab Wednesday, April 13, 2016
  2. 2. LAB MEETING— TECHNICAL TALK COBY VINER USE CASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES OVERVIEW WHY USE GNU PARALLEL? BASIC EXAMPLES FROM THE TUTORIAL BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL] MUCH MORE SYNTAX FOR MANY OTHER TASKS MORE TUTORIAL EXAMPLES SOME EXAMPLES OF MY GNU PARALLEL USAGE
  3. 3. LAB MEETING— TECHNICAL TALK COBY VINER USE CASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES WHY USE GNU PARALLEL? a shell tool for executing jobs in parallel using one or more computers.
  4. 4. LAB MEETING— TECHNICAL TALK COBY VINER USE CASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES WHY USE GNU PARALLEL? a shell tool for executing jobs in parallel using one or more computers. Easily parallelize perfectly parallel tasks
  5. 5. LAB MEETING— TECHNICAL TALK COBY VINER USE CASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES WHY USE GNU PARALLEL? a shell tool for executing jobs in parallel using one or more computers. Easily parallelize perfectly parallel tasks For each chromosome. . .
  6. 6. LAB MEETING— TECHNICAL TALK COBY VINER USE CASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES WHY USE GNU PARALLEL? a shell tool for executing jobs in parallel using one or more computers. Easily parallelize perfectly parallel tasks For each chromosome. . . For each sex, for each technical replicate, for each hyper-parameter(s)
  7. 7. LAB MEETING— TECHNICAL TALK COBY VINER USE CASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES WHY USE GNU PARALLEL? a shell tool for executing jobs in parallel using one or more computers. Easily parallelize perfectly parallel tasks For each chromosome. . . For each sex, for each technical replicate, for each hyper-parameter(s) Job submission scripts within a for loop
  8. 8. LAB MEETING— TECHNICAL TALK COBY VINER USE CASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES WHY USE GNU PARALLEL? a shell tool for executing jobs in parallel using one or more computers. Easily parallelize perfectly parallel tasks For each chromosome. . . For each sex, for each technical replicate, for each hyper-parameter(s) Job submission scripts within a for loop Improved, cleaner, syntax (for the programmer), even in serial
  9. 9. LAB MEETING— TECHNICAL TALK COBY VINER USE CASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES WHY USE GNU PARALLEL? a shell tool for executing jobs in parallel using one or more computers. Easily parallelize perfectly parallel tasks For each chromosome. . . For each sex, for each technical replicate, for each hyper-parameter(s) Job submission scripts within a for loop Improved, cleaner, syntax (for the programmer), even in serial Facile interleaving of tasks, in the order one is thinking about them
  10. 10. A BASIC [MAN PAGE] EXAMPLE: “WORKING AS XARGS -N1. ARGUMENT APPENDING” find . -name '*.html' | parallel gzip --best
  11. 11. A BASIC [MAN PAGE] EXAMPLE: “WORKING AS XARGS -N1. ARGUMENT APPENDING” find . -name '*.html' | parallel gzip --best find . -type f -print0 | parallel -q0 perl -i -pe 's/FOO BAR/FUBAR/g'
  12. 12. LAB MEETING— TECHNICAL TALK COBY VINER USE CASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES EASY INSTALLATION FROM SOURCE
  13. 13. LAB MEETING— TECHNICAL TALK COBY VINER USE CASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES EASY INSTALLATION FROM SOURCE
  14. 14. LAB MEETING— TECHNICAL TALK COBY VINER USE CASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES EASY INSTALLATION FROM SOURCE
  15. 15. LAB MEETING— TECHNICAL TALK COBY VINER USE CASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES EASY INSTALLATION FROM SOURCE
  16. 16. LAB MEETING— TECHNICAL TALK COBY VINER USE CASES BASIC EXAMPLES BASIC SYNTAX ADDITIONAL SYNTAX MORE EXAMPLES REAL EXAMPLES EASY INSTALLATION FROM SOURCE
  17. 17. ANOTHER BASIC [MAN PAGE] EXAMPLE: “INSERTING MULTIPLE ARGUMENTS” bash: /bin/mv: Argument list too long ls | grep -E '.log$' | parallel mv {} destdir
  18. 18. ANOTHER BASIC [MAN PAGE] EXAMPLE: “INSERTING MULTIPLE ARGUMENTS” bash: /bin/mv: Argument list too long ls | grep -E '.log$' | parallel mv {} destdir ls | grep -E '.log$' | parallel -m mv {} destdir
  19. 19. BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL] Input: parallel echo ::: A B C # command line cat abc-file | parallel echo # from STDIN parallel -a abc-file echo # from a file
  20. 20. BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL] Input: parallel echo ::: A B C # command line cat abc-file | parallel echo # from STDIN parallel -a abc-file echo # from a file Output [line order may vary]: A B C
  21. 21. BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL] Multiple inputs. Input: parallel echo ::: A B C ::: D E F cat abc-file | parallel -a - -a def-file echo parallel -a abc-file -a def-file echo cat abc-file | parallel echo :::: - def-file # alt. file parallel echo ::: A B C :::: def-file # mix cmd. and file
  22. 22. BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL] Multiple inputs. Input: parallel echo ::: A B C ::: D E F cat abc-file | parallel -a - -a def-file echo parallel -a abc-file -a def-file echo cat abc-file | parallel echo :::: - def-file # alt. file parallel echo ::: A B C :::: def-file # mix cmd. and file Output [line order may vary]: A D A E A F B D B E B F C D C E C F
  23. 23. BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL] Matching input. Input: parallel --xapply echo ::: A B C ::: D E F
  24. 24. BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL] Matching input. Input: parallel --xapply echo ::: A B C ::: D E F Output [line order may vary]: A D B E C F
  25. 25. BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL] Matching input. Input: parallel --xapply echo ::: A B C ::: D E F Output [line order may vary]: A D B E C F -xapply will wrap, if insufficient input is provided.
  26. 26. BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL] Replacement strings: The 7 predefined replacement strings Input: parallel echo {} ::: A/B.C parallel echo {.} ::: A/B.C Output: A/B.C A/B
  27. 27. BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL] Replacement strings: The 7 predefined replacement strings Input: parallel echo {} ::: A/B.C parallel echo {.} ::: A/B.C Output: A/B.C A/B Rep. String Result . remove ext. / remove path // only path /. only ext. and path # job number % job slot number
  28. 28. BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL] Customizing replacement strings -extensionreplace to change {.} etc. Shorthand custom (PCRE+) replacement strings GNU parallel’s 7 replacement strings: --rpl '{} ' --rpl '{#} $_=$job->seq()' --rpl '{%} $_=$job->slot()' --rpl '{/} s:.*/::' --rpl '{//} $Global::use{"File::Basename"} ||= eval "use File::Basename; 1;"; $_ = dirname($_);' --rpl '{/.} s:.*/::; s:.[^/.]+$::;' --rpl '{.} s:.[^/.]+$::'
  29. 29. BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL] Multiple input sources and positional replacement: parallel echo {1} and {2} ::: A B ::: C D
  30. 30. BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL] Multiple input sources and positional replacement: parallel echo {1} and {2} ::: A B ::: C D Always try to define replacements, with {<>} syntax.
  31. 31. BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL] Multiple input sources and positional replacement: parallel echo {1} and {2} ::: A B ::: C D Always try to define replacements, with {<>} syntax. Test with --dry-run first.
  32. 32. MUCH MORE SYNTAX FOR MANY OTHER TASKS --pipe: instead of STDIN as command args, data sent to STDIN of command
  33. 33. MUCH MORE SYNTAX FOR MANY OTHER TASKS --pipe: instead of STDIN as command args, data sent to STDIN of command command_A | command_B | command_C, where command_B is slow
  34. 34. MUCH MORE SYNTAX FOR MANY OTHER TASKS --pipe: instead of STDIN as command args, data sent to STDIN of command command_A | command_B | command_C, where command_B is slow Remote execution to directly parallelize over multiple machines
  35. 35. MUCH MORE SYNTAX FOR MANY OTHER TASKS --pipe: instead of STDIN as command args, data sent to STDIN of command command_A | command_B | command_C, where command_B is slow Remote execution to directly parallelize over multiple machines Working directly with a SQL database
  36. 36. MUCH MORE SYNTAX FOR MANY OTHER TASKS --pipe: instead of STDIN as command args, data sent to STDIN of command command_A | command_B | command_C, where command_B is slow Remote execution to directly parallelize over multiple machines Working directly with a SQL database Shebang: often cat input_file | parallel command, but can do #!/usr/bin/parallel --shebang -r echo
  37. 37. MUCH MORE SYNTAX FOR MANY OTHER TASKS --pipe: instead of STDIN as command args, data sent to STDIN of command command_A | command_B | command_C, where command_B is slow Remote execution to directly parallelize over multiple machines Working directly with a SQL database Shebang: often cat input_file | parallel command, but can do #!/usr/bin/parallel --shebang -r echo As a counting semaphore: parallel --semaphore or sem
  38. 38. MUCH MORE SYNTAX FOR MANY OTHER TASKS --pipe: instead of STDIN as command args, data sent to STDIN of command command_A | command_B | command_C, where command_B is slow Remote execution to directly parallelize over multiple machines Working directly with a SQL database Shebang: often cat input_file | parallel command, but can do #!/usr/bin/parallel --shebang -r echo As a counting semaphore: parallel --semaphore or sem Default is one slot: a mutex
  39. 39. ANOTHER [MAN PAGE] EXAMPLE: “AGGREGATING CONTENT OF FILES” parallel --header : echo x{X}y{Y}z{Z} > x{X}y{Y}z{Z} ::: X {1..5} ::: Y {01..10} ::: Z {1..5}
  40. 40. ANOTHER [MAN PAGE] EXAMPLE: “AGGREGATING CONTENT OF FILES” parallel --header : echo x{X}y{Y}z{Z} > x{X}y{Y}z{Z} ::: X {1..5} ::: Y {01..10} ::: Z {1..5} parallel eval 'cat {=s/y01/y*/=} > {=s/y01//=}' ::: *y01* This runs: cat x1y*z1 > x1z1, ∀x∀z
  41. 41. POST-MEME2IMAGES INKSCAPE CONVERSIONS FOR PUBLICATION-READY CENTRIMO PLOTS AND SEQUENCE LOGOS parallel inkscape --vacuum-defs --export-pdf={.}.pdf {} ::: "$centrimo_eps_1" "$centrimo_eps_2"
  42. 42. POST-MEME2IMAGES INKSCAPE CONVERSIONS FOR PUBLICATION-READY CENTRIMO PLOTS AND SEQUENCE LOGOS parallel inkscape --vacuum-defs --export-pdf={.}.pdf {} ::: "$centrimo_eps_1" "$centrimo_eps_2" parallel "inkscape --vacuum-defs --export-pdf={.}.pdf {}; pdfcrop --hires --clip --margins '0 0 0 -12' {.}.pdf; mv -f {.}-crop.pdf {.}.pdf " ::: logo+([:digit:])$VECTOR_FILE_EXT
  43. 43. FIXING DIRECTORY STRUCTURES—SYMBOLIC LINK ISSUES (FOR DATA PROVENANCE) parallel --dry-run -j 1 --rpl '{s} s@.*?((?:fe)?male_d+-d+).*@$1@' "{s}; ln -s /$(readlink {}) {}" ::: $(find . -mindepth 3 -maxdepth 3 -xtype l)
  44. 44. FIXING DIRECTORY STRUCTURES—SYMBOLIC LINK ISSUES (FOR DATA PROVENANCE) parallel --dry-run -j 1 --rpl '{s} s@.*?((?:fe)?male_d+-d+).*@$1@' "{s}; ln -s /$(readlink {}) {}" ::: $(find . -mindepth 3 -maxdepth 3 -xtype l) parallel --rpl '{s} s:.+?/(.+?)_peaks.narrowPeak.gz$: 1_summits.bed.gz:' "ln -s ../../linked-2015-10-07-.../data/MACS/{s} {//}/" ::: */*_peaks.narrowPeak.gz
  45. 45. FIXING DIRECTORY STRUCTURES—SYMBOLIC LINK ISSUES (FOR DATA PROVENANCE) parallel --dry-run -j 1 --rpl '{s} s@.*?((?:fe)?male_d+-d+).*@$1@' "{s}; ln -s /$(readlink {}) {}" ::: $(find . -mindepth 3 -maxdepth 3 -xtype l) parallel --rpl '{s} s:.+?/(.+?)_peaks.narrowPeak.gz$: 1_summits.bed.gz:' "ln -s ../../linked-2015-10-07-.../data/MACS/{s} {//}/" ::: */*_peaks.narrowPeak.gz parallel -j 1 --rpl '{...} s:/.*::;' "dir=$(readlink -f {} | sed -r 's:/linked.+?/:/{...}/:'); mkdir $dir; rm -f {}; ln -s $dir {//}/M-ChIP_runs" ::: $(find linked-2016-01-31-* -type l -name 'M-ChIP_runs')
  46. 46. EXPLORING/COLLATING COMPLEX CENTRIMO RESULTS parallel --dry-run -j 1 --rpl '{sex} s:.*?(w*male)-d+_d+.*:1:' --rpl '{rep} s:.*?male-(d+_d+).*:1:' --rpl "{TFinfo} s:.*?([^/]+)-expandedTo500bpRegions- mod.*:1:" --rpl '{thresh} s:.*?d+_d+-(0.d+).*:1:' "awk '$0 !~ /^#/ {$1=""; $2=""; print "{TFinfo}","{sex}","{rep}"," {thresh}"$0;}' {} | sed -r 's/[[:space:]]+/t/g'" ::: $(find ../MEME-ChIP_runs-initial_controls/ -mindepth 5 -wholename '*hypothesis_testing_selected_controlledVars/ centrimo_out/centrimo.txt' | head )
  47. 47. PROCESSING CHIP-SEQ PEAK DATA WITH MACS parallel --rpl '{/..SRF} s:../w+[-.](SRFw*).*:$1:i;' macs14 callpeak -t {} -n {/..SRF} -g 'mm' -s 51 --bw 150 -S -p 0.0001 ::: ../*.alignment.mm8.bed.gz
  48. 48. PROCESSING CHIP-SEQ PEAK DATA WITH MACS parallel --rpl '{/..SRF} s:../w+[-.](SRFw*).*:$1:i;' macs14 callpeak -t {} -n {/..SRF} -g 'mm' -s 51 --bw 150 -S -p 0.0001 ::: ../*.alignment.mm8.bed.gz parallel "zcat {} | awk 'BEGIN{FS=OFS="t"} NR > 1 {print $2,$3,$4;}' | pigz -9 > {/.}.bed.gz" ::: ../*MACS_peaks_annot.txt.gz liftoverAll '.bed.gz'
  49. 49. PROCESSING CHIP-SEQ PEAK DATA WITH MACS function liftoverAll { parallel liftOver {} "$LIFTOVER_CHAIN_FILE_FULL_PATH" ../$LIFTED_OVER_DIR_NAME/{/.}.liftedmm9 ../$LIFTED_OVER_DIR_NAME/{/.}.unlifted ::: *"$1" pigz -9 ../$LIFTED_OVER_DIR_NAME/*.liftedmm9 ../$LIFTED_OVER_DIR_NAME/*.unlifted }
  50. 50. PROCESSING CHIP-SEQ PEAK DATA WITH MACS function liftoverAll { parallel liftOver {} "$LIFTOVER_CHAIN_FILE_FULL_PATH" ../$LIFTED_OVER_DIR_NAME/{/.}.liftedmm9 ../$LIFTED_OVER_DIR_NAME/{/.}.unlifted ::: *"$1" pigz -9 ../$LIFTED_OVER_DIR_NAME/*.liftedmm9 ../$LIFTED_OVER_DIR_NAME/*.unlifted } parallel -j ${NSLOTS:=1} --xapply --rpl '{r} s:.*RS(d+).*:1:' "$MACS_CMD_AND_COMMON_PARAMS -f BAMPE -n 'M-r{1r}' -t {1} -c {2} |& tee -a '$OUT_DIR/M-r{1r}.log'" ::: $IN_DIR/1494*@(1|2|3).bam ::: $IN_DIR/1494*@(4|5|6).bam
  51. 51. PIPELINE—PROCESSING BISULFITE SEQUENCING DATA WITH METHPIPE merge_methcount_cmds=$( parallel -j $NSLOTS --joblog "x.log" --rpl '{-../} s:.*/::; s:(.[^.]+)+$::; s:-d+$::;' --dry-run "echo "$MODULE_LOAD_CMD export LC_ALL=C; cat $ALIGNED_DIR/{-../}*.tomr | sort -k 1,1 -k 2,2n -k 3,3n -k 6,6 | ldots | methcounts -v -c $BISMARK_REF -o $COUNTS_DIR/{-../}_pool_ALL.meth /dev/stdin" | tee -a /dev/stderr | qsub ldots ::: $IN_DIR/*.1.fastq.gz | sort -V | uniq )

×