Successfully reported this slideshow.

Productivity tips - Introduction to linux for bioinformatics

786 views

Published on

Part 6 of the training "Introduction to linux for bioinformatics". Some useful tips to get your bioinformatics scripts better.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Productivity tips - Introduction to linux for bioinformatics

  1. 1. Productivity Joachim Jacob 8 and 15 November 2013
  2. 2. Multiple commands In bash, commands put on one line when be separated by “;” $ wget http://homepage.tudelft.nl/19j49/t-SNE_files/tSNE_linux.ta r.gz ; tar xvfz tSNE_linux.tar.gz
  3. 3. Multiple commands Commands on a oneliner can also be separated by && or || && Only execute the command if the preceding one finished correctly. $ curl corz.org/ip && echo 'n' || (not a pipe!) - Inverse of the above. Only execute the command if the preceding one did not succesfully ends.
  4. 4. Piping a list of files with xargs A pipe reads the output of a command. $ ls | less Some commands requires the file name to be passed, instead of the content of the file. E.g. this doesn't work: $ ls | file Usage: file [-bchikLlNnprsvz0] [--apple] [--mime-encoding] [--mime-type] [-e testname] [-F separator] [-f namefile] [-m magicfiles] file ... file -C [-m magicfiles] file [--help]
  5. 5. Piping a list of files with xargs Some commands requires the file name to be passed, instead of the content of the file. xargs passes the output of a command as a list of arguments to another program. $ ls | xargs file bin: directory buddy.sh: Bourne-Again shell script, ASCII text executable Compression_exercise: directory Desktop: directory Documents: directory Downloads: directory FastQValidator.0.1.1.tgz: gzip compressed data, from Unix, last modified: Fri Oct 19 16:44:23 2012
  6. 6. .bashrc ~/.bashrc is a hidden configuration file for bash in your home. It configures the prompt in your terminal. It contains aliases to commands.
  7. 7. alias example When you enter a first word on the command line that bash does not recognize as a command, it will search in the aliases for the word. You can specify aliases in .bashrc. An example:
  8. 8. Alias example Some interesting aliases alias alias alias alias alias ll='ls -lh' dirsize="du -sh */" uncom='grep -v -E "^#|^$"' hosts="cat /etc/hosts" dedup="awk '! x[$0]++' " Aliases are perfectly suited for storing one-liners: find some at https://wikis.utexas.edu/display/bioiteam/Scott%27s+ list+of+linux+one-liners
  9. 9. Alias exercise → exercise link
  10. 10. Finding stuff: locate Extremely quick and convenient: locate However, it won't find the newest files you created. First you need to update the database by running: updatedb It accepts wildcards. Example: $ locate *.sam Bonus: How to filter on a certain location?
  11. 11. Finding stuff: find More elaborate tool to find stuff: $ find -name alignment.sam Find won't find without specifying options: -name : to search on the name of the file -type : to search for the type: (f)ile, (d)irectory, (l)ink -perm : to search for the permissions (111 or rwx) … This is the power tool to find stuff.
  12. 12. Finding stuff: find The most powerful option of find: -exec Execute a command on the found entities.
  13. 13. Finding stuff: find The most powerful option of find: -exec Execute a command on the found entities. $ find -name *.gz ./DRR000542_2.fastq.subset.gz ./DRR000542_1.fastq.subset.gz ./DRR000545_2.fastq.subset.gz ./DRR000545_1.fastq.subset.gz $ find -name *.gz -exec gunzip {} ; $ ls DRR000542_1.fastq.subset DRR000545_1.fastq.subset DRR000542_2.fastq.subset DRR000545_2.fastq.subset
  14. 14. Command substitution in bash In bash, the output of commands can be directly stored in a variable. Put the command between back-ticks. $ test=`ls -l` $ echo $test total 7929624 -rw-rw-r-- 1 joachim joachim 15326 May 10 2013 0538c2b.jpg -rw-rw-r-- 1 joachim joachim 4914797 Nov 8 16:15 18d7alY
  15. 15. Command substitution in bash A variable can also contain a list. A list contains several entities (e.g. files). Extracting first 100k lines from compressed text file: for filename in `ls DRR00054*tar.gz`; do zcat $filename | head -n 1000000 >${file%.gz}.subset; done The output of ls is being put in a list. 'for' assigns one after the other the name of the file to the variable file. This variable is used in the oneliner zcat | head.
  16. 16. Keywords .bashrc ; alias prompt locate find Command substitution Write in your own words what the terms mean
  17. 17. Break

×