Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Education
  • Be the first to comment


  1. 1. Chapter 5: Understanding Text Processing The Complete Guide to Linux System Administration
  2. 2. Objectives <ul><li>Use regular expressions in a variety of circumstances </li></ul><ul><li>Manipulate text files in complex ways using multiple command-line utilities </li></ul><ul><li>Use advanced features of the vi editor </li></ul><ul><li>Use the sed and awk text processing utilities </li></ul>
  3. 3. Regular Expressions <ul><li>Flexible way to encode many types of complex patterns </li></ul><ul><li>Use to define pattern in many situations </li></ul><ul><ul><li>Parameter to most Linux commands </li></ul></ul><ul><ul><li>Within vi editor </li></ul></ul><ul><ul><li>Within programming languages </li></ul></ul><ul><ul><ul><li>Including shell scripts </li></ul></ul></ul><ul><li>Used for text </li></ul>
  4. 4. Regular Expressions (continued)
  5. 5. Regular Expressions (continued)
  6. 6. Regular Expressions (continued) <ul><li>Acceptable syntax varies in small but important ways </li></ul><ul><ul><li>Depending on where expression used </li></ul></ul><ul><li>Examples: </li></ul><ul><ul><li>[Rr]eunion[0-9][0-9].jpg </li></ul></ul><ul><ul><li>[Rr]eunion[0-9]{2}.jpg </li></ul></ul><ul><ul><li>Reunion-[^d].jpg </li></ul></ul>
  7. 7. Manipulating Files <ul><li>Command-line utilities useful for: </li></ul><ul><ul><li>Searching </li></ul></ul><ul><ul><li>Sorting </li></ul></ul><ul><ul><li>Reorganizing </li></ul></ul><ul><ul><li>Otherwise working with text files </li></ul></ul>
  8. 8. Searching for Patterns with grep <ul><li>grep </li></ul><ul><ul><li>Rapidly scan files for specified pattern </li></ul></ul><ul><ul><li>Print out lines of text that contain text matching pattern </li></ul></ul><ul><ul><li>Take further action on matching lines of text </li></ul></ul><ul><ul><ul><li>Using pipe to connect grep with other filtering commands </li></ul></ul></ul>
  9. 9. Searching for Patterns with grep (continued) <ul><li>Examples: </li></ul><ul><ul><li>grep wilson /etc/passwd </li></ul></ul><ul><ul><li>grep thomas[Cc]orp *txt </li></ul></ul><ul><li>Often used at end of pipe </li></ul><ul><ul><li>locate tif | grep frame </li></ul></ul>
  10. 10. Examining File Contents <ul><li>head and tail commands: </li></ul><ul><ul><li>Display first few lines and last few lines of file </li></ul></ul><ul><ul><li>By default include 10 lines </li></ul></ul><ul><ul><li>-n option </li></ul></ul><ul><ul><ul><li>Specify number of lines </li></ul></ul></ul><ul><ul><li>Print output to STDOUT </li></ul></ul><ul><ul><ul><li>Redirect as needed </li></ul></ul></ul>
  11. 11. Examining File Contents (continued) <ul><li>tail –f option </li></ul><ul><ul><li>“Follows” file printing new lines as they are added to file by other programs </li></ul></ul><ul><ul><li>Very useful for tracking log files </li></ul></ul><ul><li>wc command </li></ul><ul><ul><li>Count number of characters, words, and lines </li></ul></ul>
  12. 12. Examining File Contents (continued)
  13. 13. Examining File Contents (continued) <ul><li>strings command </li></ul><ul><ul><li>Extracts text strings from file that includes binary and other non-text data </li></ul></ul><ul><ul><li>Provides convenient way to check for information that may not be otherwise available </li></ul></ul>
  14. 14. Examining File Contents (continued)
  15. 15. Manipulating Text Files <ul><li>Filtering </li></ul><ul><ul><li>Modify part of text file by adding removing or altering data in file </li></ul></ul><ul><ul><li>Based on complex rules or patterns </li></ul></ul><ul><ul><li>Use command-line programs to filter text files </li></ul></ul><ul><li>sort command </li></ul><ul><ul><li>Sort all of lines in text file </li></ul></ul><ul><li>uniq command </li></ul><ul><ul><li>Remove duplicate lines in file </li></ul></ul>
  16. 16. Manipulating Text Files (continued) <ul><li>diff command </li></ul><ul><ul><li>Displays differences between two files </li></ul></ul><ul><ul><li>Output format: </li></ul></ul><ul><ul><ul><li>< indicates lines that were not found in second file </li></ul></ul></ul><ul><ul><ul><li>> indicates lines that were not found in first file </li></ul></ul></ul><ul><li>cmp command </li></ul><ul><ul><li>Gives quick check of whether two files are identical </li></ul></ul>
  17. 17. Manipulating Text Files (continued) <ul><li>comm command </li></ul><ul><ul><li>Used to compare sorted files to see if they differ at all </li></ul></ul><ul><li>ispell spell checker </li></ul><ul><ul><li>Uses large dictionary to examine text file </li></ul></ul><ul><ul><li>Prompts with suggestions </li></ul></ul>
  18. 18. Manipulating Text Files (continued)
  19. 19. Manipulating Text Files (continued)
  20. 20. Manipulating Text Files (continued)
  21. 21. Using sed and awk <ul><li>sed </li></ul><ul><ul><li>Complex filtering program </li></ul></ul><ul><li>awk command </li></ul><ul><ul><li>Generally used for formatting output </li></ul></ul>
  22. 22. Filtering and Editing Text with sed <ul><li>sed command </li></ul><ul><ul><li>Processes each line in text file according to series of command-line options </li></ul></ul><ul><ul><li>Example: </li></ul></ul><ul><ul><ul><li>sed -n '/lincoln/p' /tmp/names </li></ul></ul></ul><ul><ul><ul><li>Prints to screen all lines of /tmp/names file that contain text “lincoln” </li></ul></ul></ul><ul><ul><li>By default, prints each line to STDOUT </li></ul></ul>
  23. 23. Filtering and Editing Text with sed (continued) <ul><li>Substitution command syntax: </li></ul><ul><ul><li>/pattern1/s/pattern2/pattern3/g </li></ul></ul><ul><ul><li>Watches for lines containing pattern1 </li></ul></ul><ul><ul><li>Replaces occurrences of pattern2 with pattern3 </li></ul></ul><ul><ul><li>g option at end of command </li></ul></ul><ul><ul><ul><li>Causes sed to replace all occurrences on each line </li></ul></ul></ul><ul><ul><ul><li>Means global </li></ul></ul></ul>
  24. 24. Filtering and Editing Text with sed (continued) <ul><li>Can place operations in file and pass file name to sed command </li></ul><ul><ul><li>sed -f nolatin news-article > new_news-article </li></ul></ul><ul><li>( & ) Operator within sed command </li></ul><ul><ul><li>Refers to text that matches pattern2 </li></ul></ul><ul><ul><li>S/[0-9]*[0-9][0-9]/$&/g </li></ul></ul><ul><li>sed often useful as part of pipeline of Linux commands </li></ul>
  25. 25. Formatting with awk <ul><li>Processes text </li></ul><ul><ul><li>Extracts parts of file </li></ul></ul><ul><ul><li>Formats text according to information you provide on command line or in script file </li></ul></ul><ul><li>Format output based on fields within line of text </li></ul><ul><li>Often can perform same functions with sed or awk </li></ul>
  26. 26. Formatting with awk (continued) <ul><li>Each field on line is normally separated by whitespace </li></ul><ul><ul><li>Can change which character awk uses to separate fields </li></ul></ul><ul><li>First field is referred to by $1 second by $2, etc. </li></ul><ul><li>Basic format: /pattern/ { actions } </li></ul><ul><li>Example: ls -l | awk '{ print $3 $9 }' </li></ul>
  27. 27. Formatting with awk (continued) <ul><li>Can include regular expression to select which lines awk includes in output: </li></ul><ul><ul><li>ls -l | awk '/^l/ {print $3 $9 }' </li></ul></ul><ul><li>Use variable or comparison in awk command </li></ul><ul><ul><li>Put at beginning of command instead of pattern </li></ul></ul><ul><ul><li>ls -l | awk ' $2 > 3 {print $0 }' </li></ul></ul><ul><li>Using awk script file: </li></ul><ul><ul><li>awk -f awk_command_list text_file </li></ul></ul>
  28. 28. More Advanced Text Editing <ul><li>vi editor provides advanced text editing features </li></ul>
  29. 29. File Operations in vi <ul><li>:w command </li></ul><ul><ul><li>Write file you are editing </li></ul></ul><ul><li>:r file name </li></ul><ul><ul><li>Insert another file into file you are editing </li></ul></ul><ul><li>:q command </li></ul><ul><ul><li>Exit from vi </li></ul></ul><ul><li>:wq </li></ul><ul><ul><li>Save and exit </li></ul></ul>
  30. 30. Screen Repositioning <ul><li>Line number and cursor position on line </li></ul><ul><ul><li>Shown at bottom right </li></ul></ul><ul><li>Use parentheses and curly braces </li></ul><ul><ul><li>Move forward or backward by one sentence or paragraph at a time </li></ul></ul><ul><li>Ctrl+f and Ctrl+b key combinations </li></ul><ul><ul><li>Move one screen forward and backward </li></ul></ul>
  31. 31. Screen Repositioning (continued) <ul><li>Shift+G </li></ul><ul><ul><li>Take you to any line in file </li></ul></ul><ul><ul><li>Enter line number first then Shift+g </li></ul></ul><ul><li>Mark </li></ul><ul><ul><li>Like bookmark </li></ul></ul><ul><ul><li>m command followed by name (a-z and 0-9) </li></ul></ul><ul><ul><ul><li>Place mark </li></ul></ul></ul><ul><ul><li>‘ command followed by mark name </li></ul></ul><ul><ul><li>Return to mark </li></ul></ul>
  32. 32. Screen Repositioning (continued) <ul><li>% </li></ul><ul><ul><li>Navigate between matching braces, parenthesis, etc. in program source code </li></ul></ul><ul><li>Shift+J </li></ul><ul><ul><li>Joins two lines </li></ul></ul>
  33. 33. More Line-Editing Commands <ul><li>:h </li></ul><ul><ul><li>View vi help file </li></ul></ul><ul><li>Ctrl+] </li></ul><ul><ul><li>Navigate to hyperlinks in help files </li></ul></ul><ul><li>Ctrl+t </li></ul><ul><ul><li>Navigate back from links in help files </li></ul></ul>
  34. 34. More Line-Editing Commands (continued) <ul><li>Forward slash (/) </li></ul><ul><ul><li>Search forward from current cursor position </li></ul></ul><ul><ul><li>Can use regular expression as search pattern </li></ul></ul><ul><li>n key </li></ul><ul><ul><li>Move to next occurrence of search pattern </li></ul></ul><ul><li>? </li></ul><ul><ul><li>Search backwards </li></ul></ul><ul><li>N key </li></ul><ul><ul><li>Move to previous occurrence of pattern </li></ul></ul>
  35. 35. More Line-Editing Commands (continued) <ul><li>Search-and-replace operations </li></ul><ul><ul><li>Format </li></ul></ul><ul><ul><ul><li>:line-number-range s/search-pattern/replacement text/flags </li></ul></ul></ul><ul><ul><li>Example </li></ul></ul><ul><ul><ul><li>:1$ s/^configure/configure/ </li></ul></ul></ul>
  36. 36. More Line-Editing Commands (continued) <ul><li>Shelling out </li></ul><ul><ul><li>Execute another Linux command </li></ul></ul><ul><ul><li>As if you were at shell prompt </li></ul></ul><ul><ul><li>Type ! followed by command </li></ul></ul><ul><ul><li>Example: :!ls /etc/samba </li></ul></ul>
  37. 37. Setting vi Options <ul><li>:set all </li></ul><ul><ul><li>View all options currently set in vi </li></ul></ul><ul><ul><li>Press spacebar multiple times to see all screens of settings </li></ul></ul><ul><li>:set without the word all </li></ul><ul><ul><li>Displays all options that current user has set </li></ul></ul><ul><li>:set followed by option </li></ul><ul><ul><li>To set option </li></ul></ul>
  38. 38. Setting vi Options (continued)
  39. 39. Setting vi Options (continued) <ul><li>Can automate settings </li></ul><ul><ul><li>Define environment variable called EXINIT that contains set command </li></ul></ul><ul><ul><li>Executed each time vi started </li></ul></ul><ul><ul><ul><li>EXINIT='set nu nosmartindent' </li></ul></ul></ul><ul><ul><li>Place settings in file called .exrc </li></ul></ul><ul><ul><ul><li>Overrides information in EXINIT variable </li></ul></ul></ul>
  40. 40. Summary <ul><li>Regular expressions used in many places to define patterns of information </li></ul><ul><li>grep command used to search for lines of text containing pattern defined using regular expression </li></ul><ul><li>sed and awk commands support complex scripting language that includes regular expressions </li></ul>
  41. 41. Summary (continued) <ul><li>vi </li></ul><ul><ul><li>Uses complex combinations of commands to reposition cursor within text </li></ul></ul><ul><ul><li>Supports search-and-replace operations </li></ul></ul><ul><ul><li>set command defines editor settings </li></ul></ul>