1. yourvirtualclass.com
Unix for Beginners
Lesson 5 - The Unix Pipe and Filters
Section 1: The Unix Pipe
The Unix pipe is a programming interface by which the output from one program becomes the
input to the next. The vertical bar (|) is used to denote the pipe.
A typical Unix pipes looks like the following:
starter | filter1 | filter2 | …. | filterN | terminator
Additional files can be inserted as input to the pipe as command line arguments if dash (-) is
used to represent the standard input:
starter | filter1 - filename | terminator
(“starter” and “terminator” are not terms used in the industry but they are useful for this
discussion)
Section 2: The Start of the Pipe
In order to begin the pipe, we need a Unix program which can generate lines of output without
relying on input from another program. The cat, head, tail, and echo programs are ideal for this
purpose.
Section 2a: Using cat to start the Pipe
The cat program (short for concatenate) is normally used to display the contents of a file (or
several files) to the Unix terminal. One or more filenames are specified as command line
arguments:
$ cat f1 # prints contents of single file (f1)
Mary
Had
A
2. Little
Lamb
$ cat f1 h1 # prints contents of multiple files (f1 followed by h1)
Mary
Had
A
Little
Lamb
Its
Fleece
Was
White
As
Snow
Note that all lines in f1 are displayed completely before the first line of h1 is displayed.
LAB1: What is the output of the following command?
$ cat h1 f1
Section 2b: Using head to start the Pipe
The head program is normally used to displays lines starting from the beginning of the file (or
several files). Like cat, the filenames are specified as command line arguments. With no
switches, the first ten lines are displayed:
$ head f1 # displays entire file (since less than 10 lines)
Mary
Had
A
Little
Lamb
Use the -n switch to display the first n lines of the file:
$ head -n 2 f1 # displays first 2 lines of f1
Mary
Had
$ head -n 2 f1 h1 # displays first 2 lines of f1 followed by the first 2 lines of h1
3. ==> f1 <==
Mary
Had
==> h1 <==
Its
Fleece
Note how the contributions from each file are labelled.
Using the -c switch, head can display bytes from the start of the file:
$ head -c 3 f1 # displays the first 3 bytes of f1
Mar
Section 2c: Using tail to start the Pipe
The tail program is normally used to displays lines at the end of the file (or several files). With no
switches, the last ten lines are displayed Its usage is very similar to head:
$ tail f1 # displays entire file (since less than 10 lines)
Mary
Had
A
Little
Lamb
Use the -n switch to display the last n lines of the file:
$ tail -n 3 f1 # displays last 3 lines of f1
A
Little
Lamb
$ tail -c 4 f1 h1 #displays the last 4 bytes of f1 followed by the last 4 bytes of h1
==> f1 <==
amb
==> h1 <==
now
Since each line is terminated by the newline character (n), only three letters are displayed.
4. Section 2d: Using echo to start the Pipe
The echo program writes its arguments to standard output. It can start a pipe without the use of
a file.
$ echo hello world
hello world
Use the -e option and n to create several lines of output
$ echo -e MarynHadnAnLittlenLamb # identical to f1
Mary
Had
A
Little
Lamb
Section 3: The Middle of the Pipe - using Filters
In the middle of the pipe, we need to read data from standard input and write data to standard
output. A Unix program which does so is called a filter, since we modify the data with each
successive program. A filter does not have to generate data on its own, and it must not require
user interaction.The standard output from previous program in the pipe becomes the standard
input to the filter. The filter then produces standard output, which becomes the standard input to
the next filter or terminator.
Section 3a: Using cut as a filter
The cut program prints parts of each line from files or from the standard input. Use the -c option
to parse lines based on character position. A comma-separated list specifies which characters
to display. Use a dash (-) to specify a range. Precede a list element with dash to include all
previous characters. End a list element with dash to include all following characters.
$ cat -A g1 # ^I is the Tab character
Mary^Ihad^Ia little lamb$
Its^Ifleece was^Iwhite as snow$
$ cat g1 | cut - c3 # display the third character on each line
r
s
5. $ cat g1 | cut - c-3 # display all characters up to and including the third
Mar
Its
$ cat g1 | cut -c-3,6-8 # also display the sixth through eighth characters
Marhad
Itslee
$ cat g1 | cut -c-3,6-8- # also display from the twentieth character to the end of the line
Marhadamb
Itsleee as snow
You can use the cut program’s -f option to parse the lines based on delimited fields. As with the
-c option, a comma-separated list specifies which fields to display. The TAB character is the
default delimiter. Use the -d option to specify a different delimiter :
$ cat g1 | cut -f1 # display the first field ( using TAB as a delimiter )
Mary
Its
$ cat g1 | cut -f2 # display the second field ( using TAB as a delimiter )
had
fleece was
$ cat g1 | cut -da -f2 # display the second field (using ‘a’ as a delimiter )
ry h
s white
$ cat g1 | cut -da -f1,2 # display the first and second field (using ‘a’ as a delimiter )
Mary h
Its fleece was white
$ cat g1 | cut -d" " -f3 # surround white space by double quotes to use as a delimiter
lamb
as
Use consecutive cut -f commands on the pipe for complex parsing :
$ cat e1
1234567:George Washington, Mount Vernon
89075:Martha Washington, Hoboken
6. $ cat e1 | cut -d: -f2 | cut -d, -f1 # display all characters after the colon but before the comma
George Washington
Martha Washington
LAB2: Given file z1:
ROW:1 DATA:Mary had a little lamb
ROW:2 DATA:Its fleece was white as snow
ROW:3 DATA:And every where that Mary went
ROW:4 DATA:The lamb was sure to go
Use the Unix pipe to extract the sixth column data and each column after
(data begins at the character after the colon).
Section 3b: Using tr as a filter
The tr program translates any characters in its input which match SET1 to SET2:
$ cat f1 | tr a b # translate all occurrences of a to b
Mbry
Hbd
A
Little
Lbmb
The translation sets are ordered:
$ echo ach|tr ac bd # translate a to b and c with d
bdh
Use the -d option to delete the characters in SET1:
$ echo aca | tr -d a # delete a’s
c
Use the -s option to “squeeze” repeated characters
$ echo aac | tr -s a b # translate a to b but squeeze repeated a’s
bc
Use -c to specify the complement of SET1:
$ echo bad | tr -c a z # translate any character that is not a with z (last z is for the newline)
7. zazz
A set can be any of a class of characters:
$ echo howdy | tr [:lower:] [:upper:] # translate lower to upper case
HOWDY
$ echo Virtual123Class456 | tr -d [:digit:] # remove digits
VirtualClass
$ cat g1 | tr -d [:blank:] # remove blanks (white space or tabs)
Maryhadalittlelamb
Itsfleecewaswhiteassnow
$ echo "Today.I;learned:pipes" | tr -d [:punct:] # remove punctuation
TodayIlearnedpipes
$ cat g1|tr -cd [:print:] # delete nonprintable characters (newline removed)
Maryhada little lambItsfleece waswhite as snow
LAB3: Using the Unix pipe, translate the string
She Sells Sea Shells by the Sea Shore
To : Tj Tjs Tja Tjs tj Tja Torj
Section 3c: Using sort as a filter
The sort program performs an inline sort of its input lines:
$ cat f2 | sort > v2
$ cat v2
A
Had
Little
Martin
Pig
Use the -u option to remove duplicate lines:
$ cat f1 f2|sort -u
A
Had
Lamb
8. Little
Martin
Mary
Pig
Use the + option followed by a number to ignore the specified number of fields while sorting.
Use the -n option to sort numerically ( 9 < 10 ) :
$ ls -l
-rw-r--r-- 1 YourVirtualClass Administrators 629 Sep 29 06:55 VeryLargeFile
-rw-r--r-- 1 YourVirtualClass Administrators 73 Sep 29 06:55 e1
-rw-r--r-- 1 YourVirtualClass Administrators 23 Sep 29 06:55 f1
-rw-r--r-- 1 YourVirtualClass Administrators 24 Sep 29 06:55 f2
-rw-r--r-- 1 YourVirtualClass Administrators 52 Sep 29 06:55 g1
-rw-r--r-- 1 YourVirtualClass Administrators 29 Sep 29 06:55 h1
-rw-r--r-- 1 YourVirtualClass Administrators 24 Sep 29 06:55 v2
$ ls -l | sort +4n # ignore 1st 4 fields, sort numerically (sort on file size)
-rw-r--r-- 1 YourVirtualClass Administrators 23 Sep 29 06:55 f1
-rw-r--r-- 1 YourVirtualClass Administrators 24 Sep 29 06:55 f2
-rw-r--r-- 1 YourVirtualClass Administrators 24 Sep 29 06:55 v2
-rw-r--r-- 1 YourVirtualClass Administrators 29 Sep 29 06:55 h1
-rw-r--r-- 1 YourVirtualClass Administrators 52 Sep 29 06:55 g1
-rw-r--r-- 1 YourVirtualClass Administrators 73 Sep 29 06:55 e1
-rw-r--r-- 1 YourVirtualClass Administrators 629 Sep 29 06:55 VeryLargeFile
Use the +-r option to reverse the order of sorting:
$ ls -l | sort +8r # ignore 1st 8 fields (reverse on filename)
-rw-r--r-- 1 YourVirtualClass Administrators 24 Sep 29 06:55 v2
-rw-r--r-- 1 YourVirtualClass Administrators 29 Sep 29 06:55 h1
-rw-r--r-- 1 YourVirtualClass Administrators 52 Sep 29 06:55 g1
-rw-r--r-- 1 YourVirtualClass Administrators 24 Sep 29 06:55 f2
-rw-r--r-- 1 YourVirtualClass Administrators 23 Sep 29 06:55 f1
-rw-r--r-- 1 YourVirtualClass Administrators 73 Sep 29 06:55 e1
-rw-r--r-- 1 YourVirtualClass Administrators 629 Sep 29 06:55 VeryLargeFile
Section 3d: Using paste as a filter
The paste program writes lines from files or the standard input in series or in parallel. With no
options, paste arranges lines from its input side by side (in parallel). Use dash (-) to treat the
standard input as one of the files :
$ cat f2
Martin
9. Had
A
Little
Pig
$ cat f1 | paste - f2 # paste lines from f1 and f2 in parallel
Mary Martin
Had Had
A A
Little Little
Lamb Pig
Use the -s option to arrange lines from input files in series. The default delimiter is TAB :
$ cat f1 | paste -s | cat -A
Mary^IHad^IA^ILittle^ILamb$
Use the -d option to change the delimiter using a recycled list of characters:
$ cat f1|paste - f2 -d:,. -s # delimit by colon, then comma, then period, then back to colon
Mary:Had,A.Little:Lamb
Martin:Had,A.Little:Pig
Section 3e: Using comm as a filter
The comm program compares sorted files line by line. The first column in its output contains
lines unique to the first file. The second column in its output contains lines unique to the second
file. The third column in its output contain lines common to both files.
$ cat f1 | sort | comm - v2 # compare two sorted files
A
Had
Lamb
Little
Martin
Mary
Pig
Use the -1 option to suppress lines unique to the first file:
$ cat f1 | sort | comm - v2 -1
10. A
Had
Little
Martin
Pig
Use the -2 option to suppress lines unique to the second file:
$ cat f1 | sort | comm - v2 -2
A
Had
Lamb
Little
Mary
Use the -3 option to suppress lines common to both files:
$ cat f1 | sort | comm - v2 -3
Lamb
Martin
Mary
Pig
Section 3f: Using join as a filter
The join program outputs lines common to two files:
$ cat f1 | sort | join - v2
A
Had
Little
Section 3g: Using expand and unexpand as filters
The expand program converts tabs to white space. The unexpand program converts white
spaces to tabs. Unexpand by default converts the initial white spaces. Use the -a option to
convert all blanks:
$ cat f1 | sort | paste -s | cat -A
A^IHad^ILamb^ILittle^IMary$
$ cat f1 | sort | paste -s | expand | cat -A
11. A Had Lamb Little Mary$
$ cat f1 | sort | paste -s | expand | unexpand | cat -A
A Had Lamb Little Mary$
$ cat f1 | sort | paste -s | expand | unexpand -a | cat -A
A^IHad^ILamb^ILittle^IMary$
Section 3h: Using fold as a filter
The fold program wraps lines such that each line has a width of 80 characters or less. Use the
-w option to specify a different width:
$ cat f1 | fold -w2 # each line has two or less characters
Ma
ry
Ha
d
A
Li
tt
le
La
mb
Section 4: The End of the Pipe
At the end of the Pipe, we need to capture the Pipe’s output and prevent it from scrolling
endlessly to the terminal. The less program is ideal for this purpose. Not only does it capture
the pipe’s output, but it also allows us to navigate anywhere in the output, and even perform
searches.
The less program requires user interaction, so it is not suitable for the start or middle of the pipe.
Other useful pipe terminators are wc, cat, tail, and head.
Section 4a: Using less to end the Pipe
The less program displays and navigates through a file specified as its command line argument:
$ less VeryLargeFile # less initially displays the first few lines in VeryLargeFile
VeryLargeFile_Line_1
VeryLargeFile_Line_2
VeryLargeFile_Line_3
12. VeryLargeFile_Line_4
VeryLargeFile_Line_5
(VeryLargeFile)
Enter shift-G to navigate to the bottom of the file:
VeryLargeFile_Line_996
VeryLargeFile_Line_997
VeryLargeFile_Line_998
VeryLargeFile_Line_999
VeryLargeFile_Line_1000
(END)
Enter g to return to the top of the file:
VeryLargeFile_Line_1
VeryLargeFile_Line_2
VeryLargeFile_Line_3
VeryLargeFile_Line_4
VeryLargeFile_Line_5
:
Enter /10 to search for the pattern 10:
VeryLargeFile_Line_10
VeryLargeFile_Line_11
VeryLargeFile_Line_12
VeryLargeFile_Line_13
VeryLargeFile_Line_14
:
Enter q to exit less.
To use less at the end of the pipe precede it by a vertical bar (|) and omit the filename. Using
the pipe starters described in Section 2 and less, we can now write our first pipe:
$ cat f1 | less # output of cat piped to less
Mary
Had
A
Little
Lamb
(END)
13. Section 4b: Using wc to end the Pipe
The wc program prints the number of lines, words, or characters in the input:
$ cat f1 | wc # f1 contains five lines
5
$ cat f1 | wc -w # f1 contains five words
23
$ cat f1 | wc -m # f1 contains 23 characters
23
LAB4: Using only a pencil and paper determine the output of the following command.
No copy pasting ☺
echo -e "My Dog Has FleasnMy Cat Eats Mice" | cut -c 4-8,12-
| tr -d [:upper:] | paste -s -d" " | fold -w4
Copyright 2016 YourVirtualClass.com. All rights reserved.