Text Processing Tools
grep
• ‘grep’ is used to search for strings and/or regular-expressions
(REGEX) in other commands’ outputs or as a search tool on its
own.
• In order to search for a string within file, we can use:
 # grep -i ’user1' /etc/passwd
user1:x:500:500::/home/user1:/bin/bash
• ‘grep’ will output the entire line in which the string we
searched for was found, as seen in the example above.
grep
• grep has many options we can use; some of the common
ones are:
 -i : case-insensitive; do not mind upper or lower case.
 -v : return anything that is NOT the string we’ve searched for.
 -r / -R : recursive; search through sub-directories as well.
 -q : suppress all normal output; useful when checking and evaluating
in scripts.
• For the full list of options, run: “grep --help” or “man grep”.
grep
• There are two more variants of grep:
 fgrep – suited for string searches only; the searches are performed
faster.
 egrep – suited for extended regex searches.
• grep can also be used as a filter, on the right side of pipes in
order to display only specific outputs:
# ls -l | grep “kf”
-rw-rw-r-- 1 nir test 0 Jul 19 15:11 kfile9
cut
• The “cut” command is used to filter out either fields or
columns of text.
• Syntax:
 cut [options] [filename(s)]
• Options:
 -f’[n]’ : [n] refers to field number(s); the fields must be separated by
a delimiter.
 -d’[delimiter]’ : this option defines which character in our string is
the delimiter; if this option is not supplied by the user, the default
will be used (TAB).
# cut -f'6','7' -d':' /etc/passwd | grep user1
/home/user1:/bin/bash
sort
• The “sort” command enabled sorting of data in numerical or
alphabetical orders.
• Syntax:
 sort [options] [filename(s)]
• Options:
 -m – merge already sorted files
 -r - reverse sort order
 -M – month name sort
 -n – numeric sort
 -u – unique sort; display only the first match of a repetitive string in
the file, only once.
uniq
• The “uniq” command searches for duplicates line of data.
• Syntax:
 uniq [options] [filename(s)]
• Options:
 -u – show only lines that are not repeated
 -d – show only one copy of the duplicate line
 -c – output each line with the count of occurrences
 -I – case-insensitive
tr
• The “tr” command is used to translate characters.
It uses two sets of characters, given as command arguments and converts
them on a char-to-char basis. Is can also:
 Converts letter cases; upper to lower and vice-versa.
 Recognizes special characters, such as n (newline)
 Cannot open files; can only use data from pipes or redirections from
within files.
• Syntax:
 tr [options] charter-list1 charter-list2 < [file]
• Options:
 -d – delete all characters appearing in “chars1”
 -s - replace instances of repeated characters with a single character.
 -cd – delete all characters that are NOT in “chars1”
tail
• The “tail” command prints the end of a file
• Syntax:
 tail [options] [filename(s)]
• Options:
 -n+N print the last N lines (default is 10)
 -n-N print the entire file starting from line N
 -f follow mode. tail will stay active and update on each new line to the
file
head
• The “head” command prints the start of a file
• Syntax:
 head [options] [filename(s)]
• Options:
 -n+N print the first N lines (default is 10)
 -n-N print the entire file until the Nth line

08 text processing_tools

  • 1.
  • 2.
    grep • ‘grep’ isused to search for strings and/or regular-expressions (REGEX) in other commands’ outputs or as a search tool on its own. • In order to search for a string within file, we can use:  # grep -i ’user1' /etc/passwd user1:x:500:500::/home/user1:/bin/bash • ‘grep’ will output the entire line in which the string we searched for was found, as seen in the example above.
  • 3.
    grep • grep hasmany options we can use; some of the common ones are:  -i : case-insensitive; do not mind upper or lower case.  -v : return anything that is NOT the string we’ve searched for.  -r / -R : recursive; search through sub-directories as well.  -q : suppress all normal output; useful when checking and evaluating in scripts. • For the full list of options, run: “grep --help” or “man grep”.
  • 4.
    grep • There aretwo more variants of grep:  fgrep – suited for string searches only; the searches are performed faster.  egrep – suited for extended regex searches. • grep can also be used as a filter, on the right side of pipes in order to display only specific outputs: # ls -l | grep “kf” -rw-rw-r-- 1 nir test 0 Jul 19 15:11 kfile9
  • 5.
    cut • The “cut”command is used to filter out either fields or columns of text. • Syntax:  cut [options] [filename(s)] • Options:  -f’[n]’ : [n] refers to field number(s); the fields must be separated by a delimiter.  -d’[delimiter]’ : this option defines which character in our string is the delimiter; if this option is not supplied by the user, the default will be used (TAB). # cut -f'6','7' -d':' /etc/passwd | grep user1 /home/user1:/bin/bash
  • 6.
    sort • The “sort”command enabled sorting of data in numerical or alphabetical orders. • Syntax:  sort [options] [filename(s)] • Options:  -m – merge already sorted files  -r - reverse sort order  -M – month name sort  -n – numeric sort  -u – unique sort; display only the first match of a repetitive string in the file, only once.
  • 7.
    uniq • The “uniq”command searches for duplicates line of data. • Syntax:  uniq [options] [filename(s)] • Options:  -u – show only lines that are not repeated  -d – show only one copy of the duplicate line  -c – output each line with the count of occurrences  -I – case-insensitive
  • 8.
    tr • The “tr”command is used to translate characters. It uses two sets of characters, given as command arguments and converts them on a char-to-char basis. Is can also:  Converts letter cases; upper to lower and vice-versa.  Recognizes special characters, such as n (newline)  Cannot open files; can only use data from pipes or redirections from within files. • Syntax:  tr [options] charter-list1 charter-list2 < [file] • Options:  -d – delete all characters appearing in “chars1”  -s - replace instances of repeated characters with a single character.  -cd – delete all characters that are NOT in “chars1”
  • 9.
    tail • The “tail”command prints the end of a file • Syntax:  tail [options] [filename(s)] • Options:  -n+N print the last N lines (default is 10)  -n-N print the entire file starting from line N  -f follow mode. tail will stay active and update on each new line to the file
  • 10.
    head • The “head”command prints the start of a file • Syntax:  head [options] [filename(s)] • Options:  -n+N print the first N lines (default is 10)  -n-N print the entire file until the Nth line

Editor's Notes

  • #6 Exrcise: Print out a list of users and their home directories
  • #7 Exrcise: sort users by Group ID. again, man sort
  • #11 Exercise: Use two terminals to examine the ‘tail -f’ option