Simple filters
Engineered for Tomorrow
2
Simple Filters
Filters are the commands which accept data from the standard input,
manipulate it and write the results to standard output.
The Sample Database:
$ cat emp.lst
2233 | a.k. Shukla | g.m. |sales |12/12/52 |6000
9876 | jai sharma |director |production |12/03/50 |7000
…
….
pr: Paginating Files
The pr command prepares a file for printing by adding suitable headers,
footers and formatted text. Invoke this command with filename as its
argument.
Engineered for Tomorrow
$ pr dept.lst
May 06 10:40 2008 dept.lst Page 1
01 :accounts: 6213
02: admin:5423
03:marketing:6521
04:personnel:2365
05:production:9876
06:sales:1006
…….blank lines……
• pr adds 5 lines of margin at the top and 5 at the bottom. The header shows the
date and time of last modification of the file along with the filename and page
number.
Engineered for Tomorrow
4
pr options:
• pr’s –k option (where k is an integer) prints in k columns.
• pr’s –t option suppresses the headers and footers as shown:
$ a.out | pr –t -5
0 4 8 12 16
1 5 9 13 17
2 6 10 14 18
3 7 11 15 19
• Other options are:
-d : Doublespaces input,, reduces clutter
-n : Numbers lines , helps in debugging code
Engineered for Tomorrow
5
• We can combine these various options to
produce the output format we need.
example: $ pr -t -n -d –o 10 dept.lst
Few more options:
pr +10 chap01 //Starts printing
from page 10
pr –l 54 chap01 //Page length set to 54
lines
Engineered for Tomorrow
6
head: Displaying the Beginning of a File
• The head command displays the top (beginning) of a file.
• when used without an option , it displays the first 10 lines of
the file:
head emp.lst // shows first 10 lines
• we can use the –n option to specify a line count and display ,
say, the first 3 lines of a file as shown below:
$ head –n 3 emp.lst
2233 | a.k. Shukla | g.m. |sales |12/12/52
|6000
9876 | jai sharma |director |production |12/03/50
|7000
Engineered for Tomorrow
7
tail :Displaying the End of a File
• tail command displays the end of a file.
• By default, it displays the last 10 lines.
• the last 3 lines are displayed in this way:
$ tail –n 3 emp.lst
• we can also address lines from the beginning of the file instead of the end as shown below , with the
help of + count option:
• The count represents the line number from where the selection should begin.
• tail +11 emp.lst // 11th line onwards
tail options:
• Monitoring File Growth (-f):
tail -f /oracle/app/oracle/product/8.1/oranist/install.log
Engineered for Tomorrow
8
cut :Slitting a File Vertically
• cut selects columns (-c) as well as fields (-f) from its input.
• consider the example below:
$ head –n 5 emp.lst | tee shortlist
2233 | a.k. Shukla | g.m. |sales |12/12/52 |6000
9876 | jai sharma |director |production |12/03/50 |7000
5678 | sumit gupta |d.g.m. |marketing |19/04/54 |6000
2365 |barun sengupta |director |personnel |05/11/47 |7800
5423 |n.k. Gupta |chairman |admin |08/30/56 |5400
• Cutting Columns (-c): use cut with –c option with a list of column numbers ,
delimited by a comma. Ranges can also be used with hyphen.
$ cut -c 6-22, 24-32 shortlist
Engineered for Tomorrow
9
a.k. Shukla g.m.
jai sharma director
sumit gupta d.g.m.
barun sengupta director
n.k. Gupta chairman
• To select a column from the beginning an dup to the end of a line :
cut –c -3, 6-22, 28-34, 55- shortlist //55- indicates column no.55
to end of line. -3 means column no. 1-3.
Cutting Fields (-f):
• Two options need to be used here are: -d for the field delimiter and –f for
the field list.
• to cut the second and 3rd fields of the file , the command is given below:
Engineered for Tomorrow
10
$ cut -d | -f 2,3 shortlist | tee cutlist1
a.k. Shukla |g.m.
jai sharma |director
sumit gupta |d.g.m.
barun sengupta |director
n.k. Gupta |chairman
• to cut out fields numbered 1,4,5 and 6 and save the
output in file cutlist2 :
cut -d “|” –f 1,4- shortlist > cutlist2
Engineered for Tomorrow
11
Extracting user list from who output:
• cut can be used to extract the first word of a line by specifying the space as the
delimiter
$ who | cut -d “” –f1 //space is the delimiter
root
kumar
sharma
• project
sachin
paste: Pasting Files
• with paste command , whatever contents are cut , can be pasted back – but
vertically rather than horizontally.
Engineered for Tomorrow
12
a.k. Shukla |g.m. 2233 |sales |12/12/52 |6000
jai sharma |director 9876 |production |12/03/50 |7000
sumit gupta |d.g.m. 5678 |marketing |19/04/54 |6000
barun sengupta |director 2365 |personnel |05/11/47 |7800
n.k. Gupta |chairman 5423 |admin |08/30/56 |5400
• the –d option can be used with the paste command so as to provide the one or
more delimiters:
$ paste -d”|” cutlist1 cutlist2
a.k. Shukla |g.m. 2233 |sales |12/12/52 |6000
jai sharma |director 9876 |production |12/03/50 |7000
sumit gupta |d.g.m. 5678 |marketing |19/04/54 |6000
barun sengupta |director 2365 |personnel |05/11/47 |7800
n.k. Gupta |chairman 5423 |admin |08/30/56 |5400
Engineered for Tomorrow
13
If the file cutlist2 doesn’t exist, the paste command can be written
like this:
$ cut –d  | -f 1,4- shortlist | paste -d “|” cutlist1 –
a.k. Shukla |g.m. 2233 |sales |12/12/52 |6000
jai sharma |director 9876 |production |12/03/50 |7000
sumit gupta |d.g.m. 5678 |marketing |19/04/54 |6000
barun sengupta |director 2365 |personnel |05/11/47 |7800
n.k. Gupta |chairman 5423 |admin |08/30/56 |5400
Joining Lines (-s):
example: $ paste –s -d “| |n ” addessbook
Engineered for Tomorrow
14
sort: Ordering a File
• Sorting is the ordering of data in ascending or descending sequence.
• The sort command orders a file.
$ sort shortlist
2233 | a.k. Shukla | g.m. |sales |12/12/52 |6000
2365 |barun sengupta |director |personnel |05/11/47 |7800
5423 |n.k. Gupta |chairman |admin |08/30/56 |5400
5678 | sumit gupta |d.g.m. |marketing |19/04/54 |6000
9876 | jai sharma |director |production |12/03/50 |7000
• by default, the sort reorders lines in ASCII collating sequence-
whitespaces first, then numerals, uppercase letters and finally
lowercase letters.
Engineered for Tomorrow
15
sort options:
Engineered for Tomorrow
-c Checks if file is sorted
-o flname Places output in file flname
1. Sorting on Primary key (-k) : To use the –k option to sort on the
second field (name). The option should be –k 2:
$ sort –t “|” -k 2 shortlist
2233 | a.k. Shukla | g.m. |sales |12/12/52 |6000
2365 |barun sengupta |director |personnel |05/11/47 |7800
9876 | jai sharma |director |production |12/03/50 |7000
5423 | n.k. Gupta |chairman |admin |08/30/56 |5400
5678 | sumit gupta |d.g.m. |marketing |19/04/54 |6000
Here, the contents are sorted on seconds field (i.e. name field).
16
Engineered for Tomorrow
Sorting on Secondary key :
We can provide more than one key to sort i.e. secondary key. If the
primary key is the 3rd field, and the secondary field is the 2nd field,
then we need to specify for every –k option , where the sort ends.
This is shown below:
$ sort -t “|” –k 3,3 k 2,2 shortlist
5423 | n.k. Gupta |chairman |admin |08/30/56 |5400
5678 | sumit gupta |d.g.m. |marketing |19/04/54 |6000
2365 |barun sengupta |director |personnel |05/11/47 |7800
9876 | jai sharma |director |production |12/03/50 |7000
2233 | a.k. Shukla | g.m. |sales |12/12/52 |6000
Numeric Sort (-n option):
We can sort the file containing only numbers as shown below:
17
Engineered for Tomorrow
$ sort numfile
10
2
27
4
When we give the command as
$ sort –n numfile
2
4
10
27
18
Engineered for Tomorrow
The –c option (check option):
To check whether the file has been sorted in the default order,
$ sort -c shortlist
$_ //File is sorted
19
Engineered for Tomorrow
20
uniq command – locate repeated and nonrepeated lines
When we concatenate or merge files, we will face the problem of duplicate entries creeping in.
UNIX offers a special tool to handle these lines – the uniq command. Consider a sorted dept.lst that
includes repeated lines:
cat dept.lst
displays all lines with duplicates. Where as,
uniq dept.lst
simply fetches one copy of each line and writes it to the standard output. Since uniq requires a sorted file
as input, the general procedure is to sort a file and pipe its output to uniq. The following pipeline also produces the
same output, except that the output is saved in a file:
sort dept.lst | uniq – uniqlist
Engineered for Tomorrow
21
Uniq Options
Different uniq options are :
Selecting the nonrepeated lines (-u)
cut –d “|” –f3 emp.lst | sort | uniq –u
Selecting the duplicate lines (-d)
cut –d “|” –f3 emp.lst | sort | uniq –d
Counting frequency of occurrence (-c)
cut –d “|” –f3 emp.lst | sort | uniq –c
Engineered for Tomorrow
22
tr command – translating characters
• The tr filter manipulates the individual characters in a line. It translates characters using one or two
compact expressions.
tr options expn1 expn2 standard input
• It takes input only from standard input, it doesn’t take a filename as argument. By default, it translates
each character in expression1 to its mapped counterpart in expression2. The first character in the first
expression is replaced with the first character in the second expression, and similarly for the other
characters.
tr ‘|/’ ‘~-’ < emp.lst | head –n 3
exp1=‘|/’ ; exp2=‘~-’
tr “$exp1” “$exp2” < emp.lst
Changing case of text is possible from lower to upper for first three lines of the file.
Engineered for Tomorrow
23
tr Options
Different tr options are:
Deleting charecters (-d)
tr –d ‘|/’ < emp.lst | head –n 3
Compressing multiple consecutive charecters (-s)
tr –s ‘ ‘ < emp.lst | head –n 3
Complementing values of expression (-c)
tr –cd ‘|/’ < emp.lst
Using ASCII octal values and escape sequences
tr ‘|’ ‘012’ < emp.lst | head –n 6

3 Simple_Filters

  • 1.
  • 2.
    Engineered for Tomorrow 2 SimpleFilters Filters are the commands which accept data from the standard input, manipulate it and write the results to standard output. The Sample Database: $ cat emp.lst 2233 | a.k. Shukla | g.m. |sales |12/12/52 |6000 9876 | jai sharma |director |production |12/03/50 |7000 … …. pr: Paginating Files The pr command prepares a file for printing by adding suitable headers, footers and formatted text. Invoke this command with filename as its argument.
  • 3.
    Engineered for Tomorrow $pr dept.lst May 06 10:40 2008 dept.lst Page 1 01 :accounts: 6213 02: admin:5423 03:marketing:6521 04:personnel:2365 05:production:9876 06:sales:1006 …….blank lines…… • pr adds 5 lines of margin at the top and 5 at the bottom. The header shows the date and time of last modification of the file along with the filename and page number.
  • 4.
    Engineered for Tomorrow 4 proptions: • pr’s –k option (where k is an integer) prints in k columns. • pr’s –t option suppresses the headers and footers as shown: $ a.out | pr –t -5 0 4 8 12 16 1 5 9 13 17 2 6 10 14 18 3 7 11 15 19 • Other options are: -d : Doublespaces input,, reduces clutter -n : Numbers lines , helps in debugging code
  • 5.
    Engineered for Tomorrow 5 •We can combine these various options to produce the output format we need. example: $ pr -t -n -d –o 10 dept.lst Few more options: pr +10 chap01 //Starts printing from page 10 pr –l 54 chap01 //Page length set to 54 lines
  • 6.
    Engineered for Tomorrow 6 head:Displaying the Beginning of a File • The head command displays the top (beginning) of a file. • when used without an option , it displays the first 10 lines of the file: head emp.lst // shows first 10 lines • we can use the –n option to specify a line count and display , say, the first 3 lines of a file as shown below: $ head –n 3 emp.lst 2233 | a.k. Shukla | g.m. |sales |12/12/52 |6000 9876 | jai sharma |director |production |12/03/50 |7000
  • 7.
    Engineered for Tomorrow 7 tail:Displaying the End of a File • tail command displays the end of a file. • By default, it displays the last 10 lines. • the last 3 lines are displayed in this way: $ tail –n 3 emp.lst • we can also address lines from the beginning of the file instead of the end as shown below , with the help of + count option: • The count represents the line number from where the selection should begin. • tail +11 emp.lst // 11th line onwards tail options: • Monitoring File Growth (-f): tail -f /oracle/app/oracle/product/8.1/oranist/install.log
  • 8.
    Engineered for Tomorrow 8 cut:Slitting a File Vertically • cut selects columns (-c) as well as fields (-f) from its input. • consider the example below: $ head –n 5 emp.lst | tee shortlist 2233 | a.k. Shukla | g.m. |sales |12/12/52 |6000 9876 | jai sharma |director |production |12/03/50 |7000 5678 | sumit gupta |d.g.m. |marketing |19/04/54 |6000 2365 |barun sengupta |director |personnel |05/11/47 |7800 5423 |n.k. Gupta |chairman |admin |08/30/56 |5400 • Cutting Columns (-c): use cut with –c option with a list of column numbers , delimited by a comma. Ranges can also be used with hyphen. $ cut -c 6-22, 24-32 shortlist
  • 9.
    Engineered for Tomorrow 9 a.k.Shukla g.m. jai sharma director sumit gupta d.g.m. barun sengupta director n.k. Gupta chairman • To select a column from the beginning an dup to the end of a line : cut –c -3, 6-22, 28-34, 55- shortlist //55- indicates column no.55 to end of line. -3 means column no. 1-3. Cutting Fields (-f): • Two options need to be used here are: -d for the field delimiter and –f for the field list. • to cut the second and 3rd fields of the file , the command is given below:
  • 10.
    Engineered for Tomorrow 10 $cut -d | -f 2,3 shortlist | tee cutlist1 a.k. Shukla |g.m. jai sharma |director sumit gupta |d.g.m. barun sengupta |director n.k. Gupta |chairman • to cut out fields numbered 1,4,5 and 6 and save the output in file cutlist2 : cut -d “|” –f 1,4- shortlist > cutlist2
  • 11.
    Engineered for Tomorrow 11 Extractinguser list from who output: • cut can be used to extract the first word of a line by specifying the space as the delimiter $ who | cut -d “” –f1 //space is the delimiter root kumar sharma • project sachin paste: Pasting Files • with paste command , whatever contents are cut , can be pasted back – but vertically rather than horizontally.
  • 12.
    Engineered for Tomorrow 12 a.k.Shukla |g.m. 2233 |sales |12/12/52 |6000 jai sharma |director 9876 |production |12/03/50 |7000 sumit gupta |d.g.m. 5678 |marketing |19/04/54 |6000 barun sengupta |director 2365 |personnel |05/11/47 |7800 n.k. Gupta |chairman 5423 |admin |08/30/56 |5400 • the –d option can be used with the paste command so as to provide the one or more delimiters: $ paste -d”|” cutlist1 cutlist2 a.k. Shukla |g.m. 2233 |sales |12/12/52 |6000 jai sharma |director 9876 |production |12/03/50 |7000 sumit gupta |d.g.m. 5678 |marketing |19/04/54 |6000 barun sengupta |director 2365 |personnel |05/11/47 |7800 n.k. Gupta |chairman 5423 |admin |08/30/56 |5400
  • 13.
    Engineered for Tomorrow 13 Ifthe file cutlist2 doesn’t exist, the paste command can be written like this: $ cut –d | -f 1,4- shortlist | paste -d “|” cutlist1 – a.k. Shukla |g.m. 2233 |sales |12/12/52 |6000 jai sharma |director 9876 |production |12/03/50 |7000 sumit gupta |d.g.m. 5678 |marketing |19/04/54 |6000 barun sengupta |director 2365 |personnel |05/11/47 |7800 n.k. Gupta |chairman 5423 |admin |08/30/56 |5400 Joining Lines (-s): example: $ paste –s -d “| |n ” addessbook
  • 14.
    Engineered for Tomorrow 14 sort:Ordering a File • Sorting is the ordering of data in ascending or descending sequence. • The sort command orders a file. $ sort shortlist 2233 | a.k. Shukla | g.m. |sales |12/12/52 |6000 2365 |barun sengupta |director |personnel |05/11/47 |7800 5423 |n.k. Gupta |chairman |admin |08/30/56 |5400 5678 | sumit gupta |d.g.m. |marketing |19/04/54 |6000 9876 | jai sharma |director |production |12/03/50 |7000 • by default, the sort reorders lines in ASCII collating sequence- whitespaces first, then numerals, uppercase letters and finally lowercase letters.
  • 15.
  • 16.
    Engineered for Tomorrow -cChecks if file is sorted -o flname Places output in file flname 1. Sorting on Primary key (-k) : To use the –k option to sort on the second field (name). The option should be –k 2: $ sort –t “|” -k 2 shortlist 2233 | a.k. Shukla | g.m. |sales |12/12/52 |6000 2365 |barun sengupta |director |personnel |05/11/47 |7800 9876 | jai sharma |director |production |12/03/50 |7000 5423 | n.k. Gupta |chairman |admin |08/30/56 |5400 5678 | sumit gupta |d.g.m. |marketing |19/04/54 |6000 Here, the contents are sorted on seconds field (i.e. name field). 16
  • 17.
    Engineered for Tomorrow Sortingon Secondary key : We can provide more than one key to sort i.e. secondary key. If the primary key is the 3rd field, and the secondary field is the 2nd field, then we need to specify for every –k option , where the sort ends. This is shown below: $ sort -t “|” –k 3,3 k 2,2 shortlist 5423 | n.k. Gupta |chairman |admin |08/30/56 |5400 5678 | sumit gupta |d.g.m. |marketing |19/04/54 |6000 2365 |barun sengupta |director |personnel |05/11/47 |7800 9876 | jai sharma |director |production |12/03/50 |7000 2233 | a.k. Shukla | g.m. |sales |12/12/52 |6000 Numeric Sort (-n option): We can sort the file containing only numbers as shown below: 17
  • 18.
    Engineered for Tomorrow $sort numfile 10 2 27 4 When we give the command as $ sort –n numfile 2 4 10 27 18
  • 19.
    Engineered for Tomorrow The–c option (check option): To check whether the file has been sorted in the default order, $ sort -c shortlist $_ //File is sorted 19
  • 20.
    Engineered for Tomorrow 20 uniqcommand – locate repeated and nonrepeated lines When we concatenate or merge files, we will face the problem of duplicate entries creeping in. UNIX offers a special tool to handle these lines – the uniq command. Consider a sorted dept.lst that includes repeated lines: cat dept.lst displays all lines with duplicates. Where as, uniq dept.lst simply fetches one copy of each line and writes it to the standard output. Since uniq requires a sorted file as input, the general procedure is to sort a file and pipe its output to uniq. The following pipeline also produces the same output, except that the output is saved in a file: sort dept.lst | uniq – uniqlist
  • 21.
    Engineered for Tomorrow 21 UniqOptions Different uniq options are : Selecting the nonrepeated lines (-u) cut –d “|” –f3 emp.lst | sort | uniq –u Selecting the duplicate lines (-d) cut –d “|” –f3 emp.lst | sort | uniq –d Counting frequency of occurrence (-c) cut –d “|” –f3 emp.lst | sort | uniq –c
  • 22.
    Engineered for Tomorrow 22 trcommand – translating characters • The tr filter manipulates the individual characters in a line. It translates characters using one or two compact expressions. tr options expn1 expn2 standard input • It takes input only from standard input, it doesn’t take a filename as argument. By default, it translates each character in expression1 to its mapped counterpart in expression2. The first character in the first expression is replaced with the first character in the second expression, and similarly for the other characters. tr ‘|/’ ‘~-’ < emp.lst | head –n 3 exp1=‘|/’ ; exp2=‘~-’ tr “$exp1” “$exp2” < emp.lst Changing case of text is possible from lower to upper for first three lines of the file.
  • 23.
    Engineered for Tomorrow 23 trOptions Different tr options are: Deleting charecters (-d) tr –d ‘|/’ < emp.lst | head –n 3 Compressing multiple consecutive charecters (-s) tr –s ‘ ‘ < emp.lst | head –n 3 Complementing values of expression (-c) tr –cd ‘|/’ < emp.lst Using ASCII octal values and escape sequences tr ‘|’ ‘012’ < emp.lst | head –n 6