Exrcise: Print out a list of users and their home directories
Exrcise: sort users by Group ID. again, man sort
Exercise: Use two terminals to examine the ‘tail -f’ option
08 text processing_tools
Text Processing Tools
• ‘grep’ is used to search for strings and/or regular-expressions
(REGEX) in other commands’ outputs or as a search tool on its
• In order to search for a string within file, we can use:
# grep -i ’user1' /etc/passwd
• ‘grep’ will output the entire line in which the string we
searched for was found, as seen in the example above.
• grep has many options we can use; some of the common
-i : case-insensitive; do not mind upper or lower case.
-v : return anything that is NOT the string we’ve searched for.
-r / -R : recursive; search through sub-directories as well.
-q : suppress all normal output; useful when checking and evaluating
• For the full list of options, run: “grep --help” or “man grep”.
• There are two more variants of grep:
fgrep – suited for string searches only; the searches are performed
egrep – suited for extended regex searches.
• grep can also be used as a filter, on the right side of pipes in
order to display only specific outputs:
# ls -l | grep “kf”
-rw-rw-r-- 1 nir test 0 Jul 19 15:11 kfile9
• The “cut” command is used to filter out either fields or
columns of text.
cut [options] [filename(s)]
-f’[n]’ : [n] refers to field number(s); the fields must be separated by
-d’[delimiter]’ : this option defines which character in our string is
the delimiter; if this option is not supplied by the user, the default
will be used (TAB).
# cut -f'6','7' -d':' /etc/passwd | grep user1
• The “sort” command enabled sorting of data in numerical or
sort [options] [filename(s)]
-m – merge already sorted files
-r - reverse sort order
-M – month name sort
-n – numeric sort
-u – unique sort; display only the first match of a repetitive string in
the file, only once.
• The “uniq” command searches for duplicates line of data.
uniq [options] [filename(s)]
-u – show only lines that are not repeated
-d – show only one copy of the duplicate line
-c – output each line with the count of occurrences
-I – case-insensitive
• The “tr” command is used to translate characters.
It uses two sets of characters, given as command arguments and converts
them on a char-to-char basis. Is can also:
Converts letter cases; upper to lower and vice-versa.
Recognizes special characters, such as n (newline)
Cannot open files; can only use data from pipes or redirections from
tr [options] charter-list1 charter-list2 < [file]
-d – delete all characters appearing in “chars1”
-s - replace instances of repeated characters with a single character.
-cd – delete all characters that are NOT in “chars1”
• The “tail” command prints the end of a file
tail [options] [filename(s)]
-n+N print the last N lines (default is 10)
-n-N print the entire file starting from line N
-f follow mode. tail will stay active and update on each new line to the
• The “head” command prints the start of a file
head [options] [filename(s)]
-n+N print the first N lines (default is 10)
-n-N print the entire file until the Nth line