Linux intro 5 extra: awk

1,806 views

Published on

Lecture for the "Programming for Evolutionary Biology" workshop in Leipzig 2012 (http://evop.bioinf.uni-leipzig.de/)

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,806
On SlideShare
0
From Embeds
0
Number of Embeds
111
Actions
Shares
0
Downloads
35
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Linux intro 5 extra: awk

  1. 1. Programming for Evolutionary Biology March 17th - April 1st 2012 Leipzig, GermanyIntroduction to Unix systems Extra: awk and gawk Giovanni Marco DallOlio Universitat Pompeu Fabra Barcelona (Spain)
  2. 2. awk “awk” is a “swiss army” command line tool to  manipulate tabular files Things you can do with awk:  Extract all the lines of a file that match a pattern, and  print only some of the columns (instead of “grep |  cut”)  Add a prefix/suffix to all the element of a column  (instead of “cut | paste”)  Sum values of different columsn
  3. 3. awk and gawk In these slides we will be talking about awk In reality, the original awk is not available  anymore. We will use gawk, a free version of  gawk developed by the GNU project
  4. 4. Basic awk usage “awk <pattern to select lines> {instructions to be  executed on each line} ” 
  5. 5. Example awk usage “awk $0 ~ AAC {print} sample_vcf.vcf”  $0 ~ AAC → select all the lines that contain AAC  {print} → for each line that matches the previous  expression, print it
  6. 6. Column names in awk awk assumes that you are working on tabular files Each column of the file can be accessed by  $<column­name>. For example, $2 is the second  column of the file $0 matches all the columns of the file
  7. 7. Accessing columns in awk “awk {print $1, $2, $3} sample_vcf.vcf” → prints  the first three columns “awk {print $0} sample_vcf.vcf” → print all the  columns
  8. 8. Adding a prefix to a column with awk A common awk usage is to add a prefix or suffix to  all the entries of a column Example:   awk {print $2 “my_prefix”$2} myfile.txt
  9. 9. Summing columns in awk If two columns contain numeric values, we can use  awk to sum them Usage:  “awk {print $1 + $2} myfile.txt
  10. 10. Selecting columns with awk Awk can be used to select columns,  It is like grep, but more powerful, because it let you  specify on which columns the match must be This example will print all the lines that have a  AAC in their first colum:  “awk $1 ~ AAC {print} myfile.txt 
  11. 11. More on awk awk is a complete programming language It is the equivalent of a spreadsheet for the  command line If you want to know more, check the book “Gawk  effective AWK Programming” at  http://www.gnu.org/software/gawk/manual

×