Linux intro 5 extra: awk
Upcoming SlideShare
Loading in...5
×
 

Linux intro 5 extra: awk

on

  • 1,438 views

Lecture for the "Programming for Evolutionary Biology" workshop in Leipzig 2012 (http://evop.bioinf.uni-leipzig.de/)

Lecture for the "Programming for Evolutionary Biology" workshop in Leipzig 2012 (http://evop.bioinf.uni-leipzig.de/)

Statistics

Views

Total Views
1,438
Slideshare-icon Views on SlideShare
1,365
Embed Views
73

Actions

Likes
0
Downloads
9
Comments
0

1 Embed 73

http://bioinfoblog.it 73

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Linux intro 5 extra: awk Linux intro 5 extra: awk Presentation Transcript

    • Programming for Evolutionary Biology March 17th - April 1st 2012 Leipzig, GermanyIntroduction to Unix systems Extra: awk and gawk Giovanni Marco DallOlio Universitat Pompeu Fabra Barcelona (Spain)
    • awk “awk” is a “swiss army” command line tool to  manipulate tabular files Things you can do with awk:  Extract all the lines of a file that match a pattern, and  print only some of the columns (instead of “grep |  cut”)  Add a prefix/suffix to all the element of a column  (instead of “cut | paste”)  Sum values of different columsn
    • awk and gawk In these slides we will be talking about awk In reality, the original awk is not available  anymore. We will use gawk, a free version of  gawk developed by the GNU project
    • Basic awk usage “awk <pattern to select lines> {instructions to be  executed on each line} ” 
    • Example awk usage “awk $0 ~ AAC {print} sample_vcf.vcf”  $0 ~ AAC → select all the lines that contain AAC  {print} → for each line that matches the previous  expression, print it
    • Column names in awk awk assumes that you are working on tabular files Each column of the file can be accessed by  $<column­name>. For example, $2 is the second  column of the file $0 matches all the columns of the file
    • Accessing columns in awk “awk {print $1, $2, $3} sample_vcf.vcf” → prints  the first three columns “awk {print $0} sample_vcf.vcf” → print all the  columns
    • Adding a prefix to a column with awk A common awk usage is to add a prefix or suffix to  all the entries of a column Example:   awk {print $2 “my_prefix”$2} myfile.txt
    • Summing columns in awk If two columns contain numeric values, we can use  awk to sum them Usage:  “awk {print $1 + $2} myfile.txt
    • Selecting columns with awk Awk can be used to select columns,  It is like grep, but more powerful, because it let you  specify on which columns the match must be This example will print all the lines that have a  AAC in their first colum:  “awk $1 ~ AAC {print} myfile.txt 
    • More on awk awk is a complete programming language It is the equivalent of a spreadsheet for the  command line If you want to know more, check the book “Gawk  effective AWK Programming” at  http://www.gnu.org/software/gawk/manual