Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Chapter 5: Understanding Text Processing The Complete Guide to Linux System Administration
  2. 2. Objectives <ul><li>Use regular expressions in a variety of circumstances </li></ul><ul><li>Manipulate text files in complex ways using multiple command-line utilities </li></ul><ul><li>Use advanced features of the vi editor </li></ul><ul><li>Use the sed and awk text processing utilities </li></ul>
  3. 3. Regular Expressions <ul><li>Flexible way to encode many types of complex patterns </li></ul><ul><li>Use to define pattern in many situations </li></ul><ul><ul><li>Parameter to most Linux commands </li></ul></ul><ul><ul><li>Within vi editor </li></ul></ul><ul><ul><li>Within programming languages </li></ul></ul><ul><ul><ul><li>Including shell scripts </li></ul></ul></ul><ul><li>Used for text </li></ul>
  4. 4. Regular Expressions (continued)
  5. 5. Regular Expressions (continued)
  6. 6. Regular Expressions (continued) <ul><li>Acceptable syntax varies in small but important ways </li></ul><ul><ul><li>Depending on where expression used </li></ul></ul><ul><li>Examples: </li></ul><ul><ul><li>[Rr]eunion[0-9][0-9].jpg </li></ul></ul><ul><ul><li>[Rr]eunion[0-9]{2}.jpg </li></ul></ul><ul><ul><li>Reunion-[^d].jpg </li></ul></ul>
  7. 7. Manipulating Files <ul><li>Command-line utilities useful for: </li></ul><ul><ul><li>Searching </li></ul></ul><ul><ul><li>Sorting </li></ul></ul><ul><ul><li>Reorganizing </li></ul></ul><ul><ul><li>Otherwise working with text files </li></ul></ul>
  8. 8. Searching for Patterns with grep <ul><li>grep </li></ul><ul><ul><li>Rapidly scan files for specified pattern </li></ul></ul><ul><ul><li>Print out lines of text that contain text matching pattern </li></ul></ul><ul><ul><li>Take further action on matching lines of text </li></ul></ul><ul><ul><ul><li>Using pipe to connect grep with other filtering commands </li></ul></ul></ul>
  9. 9. Searching for Patterns with grep (continued) <ul><li>Examples: </li></ul><ul><ul><li>grep wilson /etc/passwd </li></ul></ul><ul><ul><li>grep thomas[Cc]orp *txt </li></ul></ul><ul><li>Often used at end of pipe </li></ul><ul><ul><li>locate tif | grep frame </li></ul></ul>
  10. 10. Examining File Contents <ul><li>head and tail commands: </li></ul><ul><ul><li>Display first few lines and last few lines of file </li></ul></ul><ul><ul><li>By default include 10 lines </li></ul></ul><ul><ul><li>-n option </li></ul></ul><ul><ul><ul><li>Specify number of lines </li></ul></ul></ul><ul><ul><li>Print output to STDOUT </li></ul></ul><ul><ul><ul><li>Redirect as needed </li></ul></ul></ul>
  11. 11. Examining File Contents (continued) <ul><li>tail –f option </li></ul><ul><ul><li>“Follows” file printing new lines as they are added to file by other programs </li></ul></ul><ul><ul><li>Very useful for tracking log files </li></ul></ul><ul><li>wc command </li></ul><ul><ul><li>Count number of characters, words, and lines </li></ul></ul>
  12. 12. Examining File Contents (continued)
  13. 13. Examining File Contents (continued) <ul><li>strings command </li></ul><ul><ul><li>Extracts text strings from file that includes binary and other non-text data </li></ul></ul><ul><ul><li>Provides convenient way to check for information that may not be otherwise available </li></ul></ul>
  14. 14. Examining File Contents (continued)
  15. 15. Manipulating Text Files <ul><li>Filtering </li></ul><ul><ul><li>Modify part of text file by adding removing or altering data in file </li></ul></ul><ul><ul><li>Based on complex rules or patterns </li></ul></ul><ul><ul><li>Use command-line programs to filter text files </li></ul></ul><ul><li>sort command </li></ul><ul><ul><li>Sort all of lines in text file </li></ul></ul><ul><li>uniq command </li></ul><ul><ul><li>Remove duplicate lines in file </li></ul></ul>
  16. 16. Manipulating Text Files (continued) <ul><li>diff command </li></ul><ul><ul><li>Displays differences between two files </li></ul></ul><ul><ul><li>Output format: </li></ul></ul><ul><ul><ul><li>< indicates lines that were not found in second file </li></ul></ul></ul><ul><ul><ul><li>> indicates lines that were not found in first file </li></ul></ul></ul><ul><li>cmp command </li></ul><ul><ul><li>Gives quick check of whether two files are identical </li></ul></ul>
  17. 17. Manipulating Text Files (continued) <ul><li>comm command </li></ul><ul><ul><li>Used to compare sorted files to see if they differ at all </li></ul></ul><ul><li>ispell spell checker </li></ul><ul><ul><li>Uses large dictionary to examine text file </li></ul></ul><ul><ul><li>Prompts with suggestions </li></ul></ul>
  18. 18. Manipulating Text Files (continued)
  19. 19. Manipulating Text Files (continued)
  20. 20. Manipulating Text Files (continued)
  21. 21. Using sed and awk <ul><li>sed </li></ul><ul><ul><li>Complex filtering program </li></ul></ul><ul><li>awk command </li></ul><ul><ul><li>Generally used for formatting output </li></ul></ul>
  22. 22. Filtering and Editing Text with sed <ul><li>sed command </li></ul><ul><ul><li>Processes each line in text file according to series of command-line options </li></ul></ul><ul><ul><li>Example: </li></ul></ul><ul><ul><ul><li>sed -n '/lincoln/p' /tmp/names </li></ul></ul></ul><ul><ul><ul><li>Prints to screen all lines of /tmp/names file that contain text “lincoln” </li></ul></ul></ul><ul><ul><li>By default, prints each line to STDOUT </li></ul></ul>
  23. 23. Filtering and Editing Text with sed (continued) <ul><li>Substitution command syntax: </li></ul><ul><ul><li>/pattern1/s/pattern2/pattern3/g </li></ul></ul><ul><ul><li>Watches for lines containing pattern1 </li></ul></ul><ul><ul><li>Replaces occurrences of pattern2 with pattern3 </li></ul></ul><ul><ul><li>g option at end of command </li></ul></ul><ul><ul><ul><li>Causes sed to replace all occurrences on each line </li></ul></ul></ul><ul><ul><ul><li>Means global </li></ul></ul></ul>
  24. 24. Filtering and Editing Text with sed (continued) <ul><li>Can place operations in file and pass file name to sed command </li></ul><ul><ul><li>sed -f nolatin news-article > new_news-article </li></ul></ul><ul><li>( & ) Operator within sed command </li></ul><ul><ul><li>Refers to text that matches pattern2 </li></ul></ul><ul><ul><li>S/[0-9]*[0-9][0-9]/$&/g </li></ul></ul><ul><li>sed often useful as part of pipeline of Linux commands </li></ul>
  25. 25. Formatting with awk <ul><li>Processes text </li></ul><ul><ul><li>Extracts parts of file </li></ul></ul><ul><ul><li>Formats text according to information you provide on command line or in script file </li></ul></ul><ul><li>Format output based on fields within line of text </li></ul><ul><li>Often can perform same functions with sed or awk </li></ul>
  26. 26. Formatting with awk (continued) <ul><li>Each field on line is normally separated by whitespace </li></ul><ul><ul><li>Can change which character awk uses to separate fields </li></ul></ul><ul><li>First field is referred to by $1 second by $2, etc. </li></ul><ul><li>Basic format: /pattern/ { actions } </li></ul><ul><li>Example: ls -l | awk '{ print $3 $9 }' </li></ul>
  27. 27. Formatting with awk (continued) <ul><li>Can include regular expression to select which lines awk includes in output: </li></ul><ul><ul><li>ls -l | awk '/^l/ {print $3 $9 }' </li></ul></ul><ul><li>Use variable or comparison in awk command </li></ul><ul><ul><li>Put at beginning of command instead of pattern </li></ul></ul><ul><ul><li>ls -l | awk ' $2 > 3 {print $0 }' </li></ul></ul><ul><li>Using awk script file: </li></ul><ul><ul><li>awk -f awk_command_list text_file </li></ul></ul>
  28. 28. More Advanced Text Editing <ul><li>vi editor provides advanced text editing features </li></ul>
  29. 29. File Operations in vi <ul><li>:w command </li></ul><ul><ul><li>Write file you are editing </li></ul></ul><ul><li>:r file name </li></ul><ul><ul><li>Insert another file into file you are editing </li></ul></ul><ul><li>:q command </li></ul><ul><ul><li>Exit from vi </li></ul></ul><ul><li>:wq </li></ul><ul><ul><li>Save and exit </li></ul></ul>
  30. 30. Screen Repositioning <ul><li>Line number and cursor position on line </li></ul><ul><ul><li>Shown at bottom right </li></ul></ul><ul><li>Use parentheses and curly braces </li></ul><ul><ul><li>Move forward or backward by one sentence or paragraph at a time </li></ul></ul><ul><li>Ctrl+f and Ctrl+b key combinations </li></ul><ul><ul><li>Move one screen forward and backward </li></ul></ul>
  31. 31. Screen Repositioning (continued) <ul><li>Shift+G </li></ul><ul><ul><li>Take you to any line in file </li></ul></ul><ul><ul><li>Enter line number first then Shift+g </li></ul></ul><ul><li>Mark </li></ul><ul><ul><li>Like bookmark </li></ul></ul><ul><ul><li>m command followed by name (a-z and 0-9) </li></ul></ul><ul><ul><ul><li>Place mark </li></ul></ul></ul><ul><ul><li>‘ command followed by mark name </li></ul></ul><ul><ul><li>Return to mark </li></ul></ul>
  32. 32. Screen Repositioning (continued) <ul><li>% </li></ul><ul><ul><li>Navigate between matching braces, parenthesis, etc. in program source code </li></ul></ul><ul><li>Shift+J </li></ul><ul><ul><li>Joins two lines </li></ul></ul>
  33. 33. More Line-Editing Commands <ul><li>:h </li></ul><ul><ul><li>View vi help file </li></ul></ul><ul><li>Ctrl+] </li></ul><ul><ul><li>Navigate to hyperlinks in help files </li></ul></ul><ul><li>Ctrl+t </li></ul><ul><ul><li>Navigate back from links in help files </li></ul></ul>
  34. 34. More Line-Editing Commands (continued) <ul><li>Forward slash (/) </li></ul><ul><ul><li>Search forward from current cursor position </li></ul></ul><ul><ul><li>Can use regular expression as search pattern </li></ul></ul><ul><li>n key </li></ul><ul><ul><li>Move to next occurrence of search pattern </li></ul></ul><ul><li>? </li></ul><ul><ul><li>Search backwards </li></ul></ul><ul><li>N key </li></ul><ul><ul><li>Move to previous occurrence of pattern </li></ul></ul>
  35. 35. More Line-Editing Commands (continued) <ul><li>Search-and-replace operations </li></ul><ul><ul><li>Format </li></ul></ul><ul><ul><ul><li>:line-number-range s/search-pattern/replacement text/flags </li></ul></ul></ul><ul><ul><li>Example </li></ul></ul><ul><ul><ul><li>:1$ s/^configure/configure/ </li></ul></ul></ul>
  36. 36. More Line-Editing Commands (continued) <ul><li>Shelling out </li></ul><ul><ul><li>Execute another Linux command </li></ul></ul><ul><ul><li>As if you were at shell prompt </li></ul></ul><ul><ul><li>Type ! followed by command </li></ul></ul><ul><ul><li>Example: :!ls /etc/samba </li></ul></ul>
  37. 37. Setting vi Options <ul><li>:set all </li></ul><ul><ul><li>View all options currently set in vi </li></ul></ul><ul><ul><li>Press spacebar multiple times to see all screens of settings </li></ul></ul><ul><li>:set without the word all </li></ul><ul><ul><li>Displays all options that current user has set </li></ul></ul><ul><li>:set followed by option </li></ul><ul><ul><li>To set option </li></ul></ul>
  38. 38. Setting vi Options (continued)
  39. 39. Setting vi Options (continued) <ul><li>Can automate settings </li></ul><ul><ul><li>Define environment variable called EXINIT that contains set command </li></ul></ul><ul><ul><li>Executed each time vi started </li></ul></ul><ul><ul><ul><li>EXINIT='set nu nosmartindent' </li></ul></ul></ul><ul><ul><li>Place settings in file called .exrc </li></ul></ul><ul><ul><ul><li>Overrides information in EXINIT variable </li></ul></ul></ul>
  40. 40. Summary <ul><li>Regular expressions used in many places to define patterns of information </li></ul><ul><li>grep command used to search for lines of text containing pattern defined using regular expression </li></ul><ul><li>sed and awk commands support complex scripting language that includes regular expressions </li></ul>
  41. 41. Summary (continued) <ul><li>vi </li></ul><ul><ul><li>Uses complex combinations of commands to reposition cursor within text </li></ul></ul><ul><ul><li>Supports search-and-replace operations </li></ul></ul><ul><ul><li>set command defines editor settings </li></ul></ul>