This document discusses Linux text stream filters and provides examples of common Unix commands used to process and modify text streams. These commands include cat, head, tail, cut, and split. Cat prints the contents of files, head prints the first few lines, tail prints the last few lines, cut extracts parts of each line, and split divides files into smaller parts. The document also covers input/output redirection and how it can be used with filters to modify command output and send it to files.
4. CoreLinuxforRedHatandFedoralearningunderGNUFreeDocumentationLicense-Copyleft(c)AcácioOliveira2012
Everyoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,changingisallowed
Process text streams using filters
Rundown of commands for Process text steams using filters:
commands Overview
4
• cat – concatenate files (or just show a single file without alteration)
• cut – cut chosen text out of each line of a file and display it.
• expand – expand tabs into spaces
• fmt – reformat a text file with a consistent right margin
• head – show the first few (10) lines of a file
• join – join lines of two files on a common field
• nl – print the file with numbered lines
• od – octal dump of a file (or hexadecimal).
• paste – print a number of files side by side
• pr – format for printing (split into pages or columns and add headers)
• sed – stream editor (search and replace, append, cut, delete and more)
• sort – sort in alphabetical order (and numerical order too)
• split – split a single input into multiple files
• tac – print the lines of a file from back to front (backwards cat)
• tail – print the last few lines of a file
• tr – character translation (e.g. upper case to lower case).
• unexpand – convert spaces to tabs (unlike expand).
• uniq – remove duplicate lines from a sorted file
• wc – word count (and line count, and byte count)
5. CoreLinuxforRedHatandFedoralearningunderGNUFreeDocumentationLicense-Copyleft(c)AcácioOliveira2012
Everyoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,changingisallowed
Process text streams using filters
Bash makes it possible to redirect the input and output of a command.
•Input - comes from the keyboard (ended by pressing Ctrl+D),
•Output and any errors - are displayed on the screen.
Redirection can change the input of a process, its output and the destination of the errors.
Input and output redirection
5
Redirection Effect of redirection
cmd < file Command reads input from a file
cmd > file Output of command goes to file
cmd 2> file Errors from the command go to a file
cmd >> file Output of a command is added to a file
cmd > file 2>&1 Output and Errors go to a file
cmd >& file
cmd &> file
cmd1 | cmd2 Output from command1 is input for command2
9. CoreLinuxforRedHatandFedoralearningunderGNUFreeDocumentationLicense-Copyleft(c)AcácioOliveira2012
Everyoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,changingisallowed
Process text streams using filters
head – print out the first lines
Selecting parts of a file (Filters that print out various parts of the input they receive)
9
foo:~ # head /var/log/boot.log
Apr 7 08:28:22 foo allrc: syslogd startup succeeded
Apr 7 08:28:22 foo allrc: klogd startup succeeded
Apr 7 08:28:23 foo allrc: portmap startup succeeded
Apr 7 08:27:56 foo rc.sysinit: Mounting proc filesystem: succeeded
Apr 7 08:27:56 foo rc.sysinit: Unmounting initrd: succeeded
Apr 7 08:27:56 foo sysctl: net.ipv4.ip_forward = 0
Apr 7 08:27:56 foo sysctl: net.ipv4.conf.default.rp_filter = 1
Apr 7 08:27:56 foo sysctl: kernel.sysrq = 0
Apr 7 08:28:26 foo lpd: execvp: No such file or directory
Apr 7 08:27:56 foo sysctl: kernel.core_uses_pid = 1
Ex: By default head prints out the first 10 lines of a file.
10. CoreLinuxforRedHatandFedoralearningunderGNUFreeDocumentationLicense-Copyleft(c)AcácioOliveira2012
Everyoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,changingisallowed
Process text streams using filters
Selecting parts of a file (Filters that print out various parts of the input they receive)
10
foo:~ $ ls -l / | head -n 6
total 232
drwxr-xr-x 2 root root 4096 Feb 21 15:49 bin
drwxr-xr-x 3 root root 4096 Jan 7 10:25 boot
drwxr-xr-x 5 root root 20480 Jan 10 11:35 data
drwxr-xr-x 21 root root 118784 Apr 7 08:28 dev
drwxr-xr-x 64 root root 8192 Apr 7 08:28 etc
Ex: head can print out a specific number of lines from a file or a stream.
use head to extract an exact number of bytes from an input stream (rather than lines).
Here's how to get a copy of the partition sector of a disk (be careful with that redirection).
foo:~ # head -c 512 < /dev/hda > mbr
foo:~ # ls -la mbr
-rw-r--r-- 1 root root 512 Apr 7 10:27 mbr
11. CoreLinuxforRedHatandFedoralearningunderGNUFreeDocumentationLicense-Copyleft(c)AcácioOliveira2012
Everyoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,changingisallowed
Process text streams using filters
tail – show the end of a file
Selecting parts of a file (Filters that print out various parts of the input they receive)
11
root@foo:root # tail /var/log/messages
Apr 7 11:19:34 foo dhcpd: Wrote 9 leases to leases file.
Apr 7 11:19:34 foo dhcpd: DHCPREQUEST for 10.0.0.169 from
00:80:ad:02:65:7c via eth0
Apr 7 11:19:35 foo dhcpd: DHCPACK on 10.0.0.169 to
00:80:ad:02:65:7c via eth0
Apr 7 11:20:01 foo kdm[1151]: Cannot convert Internet address
10.0.0.168 to host name
Apr 7 11:26:46 foo ipop3d[22026]: connect from 10.0.0.10
(10.0.0.10)
Apr 7 11:26:55 foo ipop3d[22028]: connect from 10.0.0.10
(10.0.0.10)
Apr 7 11:26:58 foo ipop3d[22035]: connect from 10.0.0.3 (10.0.0.3)
Apr 7 11:27:01 foo ipop3d[22036]: connect from 10.0.0.3 (10.0.0.3)
Apr 7 11:29:31 foo kdm[21954]: pam_unix2: session started for user
joe, service xdm
Apr 7 11:32:41 foo sshd[22316]: Accepted publickey for root from
10.0.0.143 port 1250 ssh2
Ex: tail is just like head, but it shows the tail end of the file.
tail can be used to watch a file as it grows.
Run the command tail –f /var/log/messages on one console and then log in on another virtual console.
tail –n 20 file or tail -20 file will show last 20 lines of file. tail –c 20 file will show last 20 characters of a file.
12. CoreLinuxforRedHatandFedoralearningunderGNUFreeDocumentationLicense-Copyleft(c)AcácioOliveira2012
Everyoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,changingisallowed
Process text streams using filters
cut – pull out columns
Selecting parts of a file (Filters that print out various parts of the input they receive)
12
root@foo:root # tail /var/log/messages
Apr 7 11:19:34 foo dhcpd: Wrote 9 leases to leases file.
Apr 7 11:19:34 foo dhcpd: DHCPREQUEST for 10.0.0.169 from
00:80:ad:02:65:7c via eth0
Apr 7 11:19:35 foo dhcpd: DHCPACK on 10.0.0.169 to
00:80:ad:02:65:7c via eth0
Apr 7 11:20:01 foo kdm[1151]: Cannot convert Internet address
10.0.0.168 to host name
Apr 7 11:26:46 foo ipop3d[22026]: connect from 10.0.0.10
(10.0.0.10)
Apr 7 11:26:55 foo ipop3d[22028]: connect from 10.0.0.10
(10.0.0.10)
Apr 7 11:26:58 foo ipop3d[22035]: connect from 10.0.0.3 (10.0.0.3)
Apr 7 11:27:01 foo ipop3d[22036]: connect from 10.0.0.3 (10.0.0.3)
Apr 7 11:29:31 foo kdm[21954]: pam_unix2: session started for user
joe, service xdm
Apr 7 11:32:41 foo sshd[22316]: Accepted publickey for root from
10.0.0.143 port 1250 ssh2
Ex:
Cut can be used to select certain columns of the input stream.
Columns can be defined by either their position, or by being separated by field separators.
15. CoreLinuxforRedHatandFedoralearningunderGNUFreeDocumentationLicense-Copyleft(c)AcácioOliveira2012
Everyoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,changingisallowed
Process text streams using filters
Split can split files into more manageable parts, e.g. for FTP uploads. The parts can be recombined with cat.
Selecting parts of a file (Filters that print out various parts of the input they receive)
15
foo:~/download $ ls -la anomy-sanitizer-1.56.tar.gz
-rw-rw-r-- 1 georgem georgem 124356 Oct 22 18:37 anomysanitizer-1.56.tar.gz
foo:~/download $ split -b 32k anomy-sanitizer-1.56.tar.gz
foo:~/download $ ls -la x*
-rw-rw-r-- 1 georgem georgem 32768 Apr 7 11:48 xaa
-rw-rw-r-- 1 georgem georgem 32768 Apr 7 11:48 xab
-rw-rw-r-- 1 georgem georgem 32768 Apr 7 11:48 xac
-rw-rw-r-- 1 georgem georgem 26052 Apr 7 11:48 xad
Ex:
Here's how to use cat to recombine parts (using md5sum to check whether the whole is equal to the sum of
the parts).
foo:~/download $ split -b 32k anomy-sanitizer-1.56.tar.gz part-
foo:~/download $ ls -la part-*
-rw-rw-r-- 1 georgem georgem 32768 Apr 7 11:49 part-aa
-rw-rw-r-- 1 georgem georgem 32768 Apr 7 11:49 part-ab
-rw-rw-r-- 1 georgem georgem 32768 Apr 7 11:49 part-ac
-rw-rw-r-- 1 georgem georgem 26052 Apr 7 11:49 part-ad
foo:~/download $ cat part-* > newfile
foo:~/download $ md5sum newfile anomy-sanitizer-1.56.tar.gz
1a977bad964b0ede863272114bfc2482 newfile
1a977bad964b0ede863272114bfc2482 anomy-sanitizer-1.56.tar.gz
18. CoreLinuxforRedHatandFedoralearningunderGNUFreeDocumentationLicense-Copyleft(c)AcácioOliveira2012
Everyoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,changingisallowed
Process text streams using filters
Sorting
18
uniq – discard duplicate lines - uniq is usually used with sort to discard duplicates
foo:~ $ cat /etc/passwd | cut -d : -f 4 | sort -n | fmt
0 0 0 0 0 0 1 2 4 7 12 13 14 25 26 28 29 30 32 37 38 42 43 47 48 50 51
69 74 77 80 89 99 100 500 501 503 504 505 506 507 509 511 512 65534
Ex: Ex: cutting fourth field out of the password file (the group ID) and sorting in numerical order.
fmt is used to make the results display on a single line.
Ex: Ex: same command pipeline, but are removing duplicates with uniq before formatting.
foo:~ $ cat /etc/passwd | cut -d : -f 4 | sort -n | uniq | fmt
0 1 2 4 7 12 13 14 25 26 28 29 30 32 37 38 42 43 47 48 50 51 69 74 77
80 89 99 100 500 501 503 504 505 506 507 509 511 512 65534
19. CoreLinuxforRedHatandFedoralearningunderGNUFreeDocumentationLicense-Copyleft(c)AcácioOliveira2012
Everyoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,changingisallowed
Process text streams using filters
Manipulation
19
tr – character set translation - usually used for converting upper case to lower case.
Can also do other character translations. tr -d can remove specific characters from a stream.
foo:~ $ man -P cat man | tr A-Z a-z | less
Ex: Ex: Translating from UPPER CASE to lower case. Asking man to use cat as its pager, instead of
using less, and opening up the man page for man itself.
Translating from lower case to UPPER CASE:
foo:~ $ man -P cat man | tr a-z A-Z | less
Convert file names to lowercase.
foo:/windows/C $ for FILE in * ; do
mv "$FILE" $( echo "$FILE" | tr A-Z a-z ) ; done
Using tr -d to delete the Carriage Returns (r) from a file created with Windows Notepad.
foo:/windows/C $ tr -d 'r' notepad.dos.txt > notepad.unix.txt
Ex:
Ex:
Ex:
21. CoreLinuxforRedHatandFedoralearningunderGNUFreeDocumentationLicense-Copyleft(c)AcácioOliveira2012
Everyoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,changingisallowed
Process text streams using filters
Manipulation
21
Joining /etc/passwd and /etc/shadow based on their first field (user name).
Since /etc/passwd and /etc/shadow use a colon to separate fields, it is necessary to use -t: option.
foo:~ # join -t : /etc/passwd /etc/shadow | head
root:x:0:0:root:/root:/bin/bash:$1$LHNUbu7U$oiuhqwd1oiuhqhAdiuHvA0:1
2146:0:99999:7:::
bin:x:1:1:bin:/bin:/sbin/nologin:*:11974:0:99999:7:::
daemon:x:2:2:daemon:/sbin:/sbin/nologin:*:11974:0:99999:7:::
adm:x:3:4:adm:/var/adm:/sbin/nologin:*:11974:0:99999:7:::
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin:*:11974:0:99999:7:::
sync:x:5:0:sync:/sbin:/bin/sync:*:11974:0:99999:7:::
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown:*:11974:0:99999:7:::
halt:x:7:0:halt:/sbin:/sbin/halt:*:11974:0:99999:7:::
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin:*:11974:0:99999:7:::
Ex:
join allows to specify which particular field to join on, and also which particular fields should appear in the
output (similar to cut)
23. CoreLinuxforRedHatandFedoralearningunderGNUFreeDocumentationLicense-Copyleft(c)AcácioOliveira2012
Everyoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,changingisallowed
Process text streams using filters
Manipulation
23
sed – stream editor - sed does text transformations on an input stream.
foo:~ $ echo "Hi Fred, how are you" | sed 's/Fred/Joe/'
Hi Joe, how are you
foo:~ $ echo "Hi Fred, how is Fred?" | sed 's/Fred/Joe/'
Hi Joe, how is Fred?
foo:~ $ echo "Hi Fred, how is Fred?" | sed 's/Fred/Joe/g‘
Hi Joe, how is Joe?
Ex:
•sed works by making only one pass over the inputs.
•A sed program consists of one or more sed commands which are applied to each line of the input.
•A command may be prefixed by an address range indicating lines for which the command performes.
sed commands: s/PATTERN/REPLACEMENT/g
search and replace. If you add g at end - search and replace applies as many times as possible to single line.
You can also use i at the end of the s command to make the search case insensitive
24. CoreLinuxforRedHatandFedoralearningunderGNUFreeDocumentationLicense-Copyleft(c)AcácioOliveira2012
Everyoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,changingisallowed
Process text streams using filters
Manipulation
24
foo:~ $ ls /bin/ | fmt -30 | nl | sed '4,15d'
1 arch ash ash.static awk
2 basename bash cat chgrp
3 chmod chown chvt cp cpio csh
16 true umount uname usleep vi
17 vim zcat zsh
foo:~ $ ls /bin | fmt -40 | nl | sed '/e/ d'
2 bash cat chgrp chmod chown chvt cp
13 vi vim zcat zsh
Ex:
sed commands: d – delete the line.
You need to select the lines as explained in the next paragraph.
The commands are most useful if you specify a range of lines to which the command applies.
Here's how to specify specific lines for the d command:
•/PATTERN/d – delete all lines contains the pattern
•4d – delete line 4
•4,10d – delete lines 4 to 10
•6,$d – delete from line 6 to the last line
29. CoreLinuxforRedHatandFedoralearningunderGNUFreeDocumentationLicense-Copyleft(c)AcácioOliveira2012
Everyoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,changingisallowed
Process text streams using filters
Manipulation
29
paste – paste two files together
foo:~/tmp $ cat file2
cabbage green leaves
cat white cat-hair
piano brown wood
foo:~/tmp $ cat file1
cat animal
cabbage vegetable
piano mineral
coal mineral
foo:~/tmp $ paste file2 file1 | expand -t 22
cabbage green leaves cat animal
cat white cat-hair cabbage vegetable
piano brown wood piano mineral
coal mineral
Ex:
Using paste is like taking two printouts of two files and sticking the right margin of one to the left margin of
the other. The glue between the files is a tab stop.
You can specify a delimiter between the files which is not a tab space with option -d
30. CoreLinuxforRedHatandFedoralearningunderGNUFreeDocumentationLicense-Copyleft(c)AcácioOliveira2012
Everyoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,changingisallowed
Process text streams using filters
Formatting
30
fmt – format nicely - format text into a neatly word wrapped structure. Default right margin is 75
foo:~ $ ls /bin | fmt
arch ash ash.static aumix-minimal awk basename bash bash2 bsh cat
chgrp
chmod chown cp cpio csh cut date dd df dmesg dnsdomainname doexec
domainname dumpkeys echo ed egrep env ex false fgrep gawk gettext
grep gtar gunzip gzip hostname igawk ipcalc kbd_mode kill link ln
loadkeys login ls mail mkdir mknod mktemp more mount mt mv netstat
nice
nisdomainname pgawk ping ps pwd red rm rmdir rpm rvi rview sed
setfont
setserial sh sleep sort stty su sync tar tcsh touch true umount
uname
unicode_start unicode_stop unlink usleep vi vi.minimal view
ypdomainname
Ex:
foo:~ $ ls /bin | fmt -40 | head
arch ash ash.static aumix-minimal
awk basename bash bash2 bsh cat chgrp
chmod chown cp cpio csh cut date dd df
dmesg dnsdomainname doexec domainname
dumpkeys echo ed egrep env ex false
fgrep gawk gettext grep gtar gunzip
gzip hostname igawk ipcalc kbd_mode
kill link ln loadkeys login ls mail
mkdir mknod mktemp more mount mt mv
netstat nice nisdomainname pgawk ping
32. CoreLinuxforRedHatandFedoralearningunderGNUFreeDocumentationLicense-Copyleft(c)AcácioOliveira2012
Everyoneispermittedtocopyanddistributeverbatimcopiesofthislicensedocument,changingisallowed
Process text streams using filters
Formatting
32
od – octal dump (and other formats)
Ex:
foo:/tmp $ echo "Hello World" > hello
foo:/tmp $ cat hello
Hello World
od, oddly enough, does not just print out in octal, but in other formats.
foo:/tmp $ od hello
0000000 062510 066154 020157 067527 066162 005144
0000014
od's behaviour is rather odd when it is used without any arguments. It prints out the octal
value of two pairs of bytes in the file. Sometimes this is useful, but usually it isnot.
Using -t switch tells od to use a specific format type (the default is od –t o2).
od –t c means character format. You will notice that file ends in a newline character (n).
foo:/tmp $ od -t c hello
0000000 H e l l o W o r l d n
0000014
od –t d1 specifies decimal format, with a width of one. character encoding is ASCII.
Ex:
Ex:
foo:/tmp $ od -t d1 hello
0000000 72 101 108 108 111 32 87 111 114 108 100 10
0000014