Grep Introduction
Upcoming SlideShare
Loading in...5
×
 

Grep Introduction

on

  • 84 views

A brief introduction to the grep command line tool

A brief introduction to the grep command line tool

Statistics

Views

Total Views
84
Views on SlideShare
83
Embed Views
1

Actions

Likes
0
Downloads
3
Comments
0

1 Embed 1

http://www.slideee.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Grep Introduction Grep Introduction Presentation Transcript

  • Colloquium - grep v1.0 A. Magee April 6, 2010 1 / 16 Colloquium - grep, v1.0 A. Magee
  • Outline 1 Introduction What does grep offer? When should I use grep? 2 Understanding Regular Expressions Class Basics Quantifiers & Grouping Online Tools Examples 3 Using Regular Expressions With grep 2 / 16 Colloquium - grep, v1.0 A. Magee
  • Outline 1 Introduction What does grep offer? When should I use grep? 2 Understanding Regular Expressions Class Basics Quantifiers & Grouping Online Tools Examples 3 Using Regular Expressions With grep 2 / 16 Colloquium - grep, v1.0 A. Magee
  • Outline 1 Introduction What does grep offer? When should I use grep? 2 Understanding Regular Expressions Class Basics Quantifiers & Grouping Online Tools Examples 3 Using Regular Expressions With grep 2 / 16 Colloquium - grep, v1.0 A. Magee
  • Introduction What? What does grep offer? grep matches regular expressions. Your first question should be“What is a regular expression?” A regular expression is a language pattern. grep and REs allow us to find complex things in text. Complex is relative and can vary from a single character to an IP address. Single character complex: [ajk+0-] IP complex: (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3} (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) 3 / 16 Colloquium - grep, v1.0 A. Magee
  • Introduction What? What does grep offer? grep matches regular expressions. Your first question should be“What is a regular expression?” A regular expression is a language pattern. grep and REs allow us to find complex things in text. Complex is relative and can vary from a single character to an IP address. Single character complex: [ajk+0-] IP complex: (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3} (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) 3 / 16 Colloquium - grep, v1.0 A. Magee
  • Introduction What? What does grep offer? grep matches regular expressions. Your first question should be“What is a regular expression?” A regular expression is a language pattern. grep and REs allow us to find complex things in text. Complex is relative and can vary from a single character to an IP address. Single character complex: [ajk+0-] IP complex: (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3} (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) 3 / 16 Colloquium - grep, v1.0 A. Magee
  • Introduction When? When should I use grep? Always! Unless you find some better tool. P.S. - grep stands for g/re/p, an ed command that means global/reg ex/print 4 / 16 Colloquium - grep, v1.0 A. Magee
  • Regular Expressions Class Basics Class Basics A character class is a symbol or collection of symbols that describes a group of characters. . (period): This matches any single character. [...]: This matches any one character in the set. [aeiou] matches one of the vowels. [a-z] matches one of the lowercase alphabet. [0-5] matches one numeral 0 through 5. You will not remember all of these until you use them often, but there are many special classes that can save you some typing. 5 / 16 Colloquium - grep, v1.0 A. Magee
  • Regular Expressions Class Basics Common Classes Special Class Meaning Simple RE d Digit characters [0-9] D Non-digit characters [ˆ0-9] w Word characters [a-zA-Z 0-9] W Non-word characters [ˆa-zA-Z 0-9] s Whitespace characters characters [fnrt] S Non-space characters [ˆfnrt] b Word boundary The word boundary class is very special as it is zero length and matches transitions between s and w and vice versa. 6 / 16 Colloquium - grep, v1.0 A. Magee
  • Regular Expressions Class Basics More Common Classes Special Class Meaning Simple RE [:alpha:] All alphabetic characters [a-zA-Z] [:alnum:] All alphabetic and numeric [a-zA-Z0-9] [:blank:] Tab and space [:cntrl:] Control characters [x00-x1Fx7F] [:digit:] A numeric digit [0-9] [:graph:] Any visible character [x21-x7E] [:lower:] Lowercase characters [a-z] [:print:] Printables (i.e. no controls) [x20-x7E] [:punct:] Punctuation & symbols [!”#$%&’()*+,-./:;<=>? @[ ]ˆ ‘{|}∼] [:space:] Space, tab, newline, etc [ trnvf] [:upper:] Uppercase characters [A-Z] [:word:] Word characters [a-zA-Z0-9 ] [:xdigit:] Hex digits [A-Fa-f0-9] 7 / 16 Colloquium - grep, v1.0 A. Magee
  • Regular Expressions Quantifiers & Grouping Quantifiers & Grouping Quantifiers are how a RE counts things. ? Exactly zero or one occurrence * Zero or more occurrences + One or more occurrences *? Zero or more occurrences non-greedy +? One or more occurrences non-greedy {x} Exactly x occurrences {x,} At least x occurrences {x,y} At least x but no more than y occurrences Grouping is used to collect patterns together and to create back-references. A group is simply a set of parentheses (). 8 / 16 Colloquium - grep, v1.0 A. Magee
  • Regular Expressions Online Tools Helpful Tools The best way to understand the rest of this presentation is to see what is being matched live. Here are some online tools that work for our needs. RegExr - www.gskinner.com/RegExr beware Flash, but it works well regexpal - regexpal.com very simple reanimator - osteele.com/tools/reanimator beware Flash, recommend CS 4/570 first rubular - rubular.com nice on-page reference 9 / 16 Colloquium - grep, v1.0 A. Magee
  • Regular Expressions Examples Your First RE Let’s skip trivial REs and get on to something useful. These may be more complex than you’re used to but the quicker you are able to read long, complex REs the better. This is a nice, but not perfect, email address matcher. [[:alnum:]][[:word:].%+-]*@(?:[[:alnum:]-]+.)+[[:alpha:]]{2,4} [[:alnum:]][[:word:].%+-]* Match a word that doesn’t start with [.%+-]. @(?:[[:alnum:]-]+.)+ Match the @ symbol and any number of subdomains followed by periods. [[:alpha:]]{2,4} Match the top level domain of 2, 3 or 4 characters. 10 / 16 Colloquium - grep, v1.0 A. Magee
  • Regular Expressions Examples Your First RE - Part 2 Let’s examine the first part. [[:alnum:]][[:word:].%+-]* [[:alnum:]] - Must start with an alphanumeric character. NB: All [: ... :] classes must live in a set like [[: ... :]]. [[:word:].%+-] - Other characters maybe a ‘word’ character, a literal space, percent symbol, plus symbol or a dash. NB: The period must be escaped because it has special meaning. * - repeat the previous set zero or more times. 11 / 16 Colloquium - grep, v1.0 A. Magee
  • Regular Expressions Examples Your First RE - Part 2 Let’s examine the first part. [[:alnum:]][[:word:].%+-]* [[:alnum:]] - Must start with an alphanumeric character. NB: All [: ... :] classes must live in a set like [[: ... :]]. [[:word:].%+-] - Other characters maybe a ‘word’ character, a literal space, percent symbol, plus symbol or a dash. NB: The period must be escaped because it has special meaning. * - repeat the previous set zero or more times. 11 / 16 Colloquium - grep, v1.0 A. Magee
  • Regular Expressions Examples Your First RE - Part 2 Let’s examine the first part. [[:alnum:]][[:word:].%+-]* [[:alnum:]] - Must start with an alphanumeric character. NB: All [: ... :] classes must live in a set like [[: ... :]]. [[:word:].%+-] - Other characters maybe a ‘word’ character, a literal space, percent symbol, plus symbol or a dash. NB: The period must be escaped because it has special meaning. * - repeat the previous set zero or more times. 11 / 16 Colloquium - grep, v1.0 A. Magee
  • Regular Expressions Examples Your First RE - Part 3 Now the second part, the subdomains, sub-subdomains, etc. @(?:[[:alnum:]-]+.)+ @ - Well that literally matches the ‘at’ character. The parenthesis denote the beginning of a group. The ?: is a confusing notation that suppresses the creation of a back reference. It is here so you’ll know of it, but it is rarely needed. Again we see a special class for alphanumerics, but we’ve also included a dash. The plus symbol tells us to look for one or more of these characters, followed by a period. And lastly we close the group and the plus symbol now tells us to look for one or more of these groups. 12 / 16 Colloquium - grep, v1.0 A. Magee
  • Regular Expressions Examples Your First RE - Part 3 Now the second part, the subdomains, sub-subdomains, etc. @(?:[[:alnum:]-]+.)+ @ - Well that literally matches the ‘at’ character. The parenthesis denote the beginning of a group. The ?: is a confusing notation that suppresses the creation of a back reference. It is here so you’ll know of it, but it is rarely needed. Again we see a special class for alphanumerics, but we’ve also included a dash. The plus symbol tells us to look for one or more of these characters, followed by a period. And lastly we close the group and the plus symbol now tells us to look for one or more of these groups. 12 / 16 Colloquium - grep, v1.0 A. Magee
  • Regular Expressions Examples Your First RE - Part 3 Now the second part, the subdomains, sub-subdomains, etc. @(?:[[:alnum:]-]+.)+ @ - Well that literally matches the ‘at’ character. The parenthesis denote the beginning of a group. The ?: is a confusing notation that suppresses the creation of a back reference. It is here so you’ll know of it, but it is rarely needed. Again we see a special class for alphanumerics, but we’ve also included a dash. The plus symbol tells us to look for one or more of these characters, followed by a period. And lastly we close the group and the plus symbol now tells us to look for one or more of these groups. 12 / 16 Colloquium - grep, v1.0 A. Magee
  • Regular Expressions Examples Your First RE - Part 3 Now the second part, the subdomains, sub-subdomains, etc. @(?:[[:alnum:]-]+.)+ @ - Well that literally matches the ‘at’ character. The parenthesis denote the beginning of a group. The ?: is a confusing notation that suppresses the creation of a back reference. It is here so you’ll know of it, but it is rarely needed. Again we see a special class for alphanumerics, but we’ve also included a dash. The plus symbol tells us to look for one or more of these characters, followed by a period. And lastly we close the group and the plus symbol now tells us to look for one or more of these groups. 12 / 16 Colloquium - grep, v1.0 A. Magee
  • Regular Expressions Examples Your First RE - Part 4 Finally the third part, the domain. [[:alpha:]]{2,4} We’ll now this part is easy. Just match 2, 3 or 4 alphabetical characters. 13 / 16 Colloquium - grep, v1.0 A. Magee
  • Regular Expressions Examples Your Second RE Now we’ll look at a RE that can help use build a header file for a c program file, given that some neglectful programmer has failed to design his/her c program properly. This will be a quicker example. ˆ[ws]*([ws*&,]*)s*{ ˆ[ws]*( At the beginning of a line match some keywords and types and the function name and then literal parenthesis. [ws*&,]* Match some more words, keywords, variable modifiers and commas. )s*{ Finally match the closing parenthesis, some whitespace and the left curly brace, denoting the start of the function body. 14 / 16 Colloquium - grep, v1.0 A. Magee
  • Regular Expressions Examples Your Second RE - Fine Details ˆ[ws]*([ws*&,]*)s*{ In general, most RE parsers will not match across multiple lines, even though the s class matches the newline character. This is very bothersome but is easily overcome by using pcregrep. pcre is Perl Compatible Regular Expression. This is all I will ever say about Perl. Notice that the literal * must be escaped like so, *. As must the parentheses due to their special RE meaning. Escaping so many characters is very annoying, but unfortunately it is necessary. 15 / 16 Colloquium - grep, v1.0 A. Magee
  • Appendix 4 Appendix 16 / 16 Colloquium - grep, v1.0 A. Magee