Upcoming SlideShare
×

# Grep Introduction

379 views
285 views

Published on

A brief introduction to the grep command line tool

Published in: Software
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
379
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
8
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Grep Introduction

1. 1. Colloquium - grep v1.0 A. Magee April 6, 2010 1 / 16 Colloquium - grep, v1.0 A. Magee
2. 2. Outline 1 Introduction What does grep oﬀer? When should I use grep? 2 Understanding Regular Expressions Class Basics Quantiﬁers & Grouping Online Tools Examples 3 Using Regular Expressions With grep 2 / 16 Colloquium - grep, v1.0 A. Magee
3. 3. Outline 1 Introduction What does grep oﬀer? When should I use grep? 2 Understanding Regular Expressions Class Basics Quantiﬁers & Grouping Online Tools Examples 3 Using Regular Expressions With grep 2 / 16 Colloquium - grep, v1.0 A. Magee
4. 4. Outline 1 Introduction What does grep oﬀer? When should I use grep? 2 Understanding Regular Expressions Class Basics Quantiﬁers & Grouping Online Tools Examples 3 Using Regular Expressions With grep 2 / 16 Colloquium - grep, v1.0 A. Magee
5. 5. Introduction What? What does grep oﬀer? grep matches regular expressions. Your ﬁrst question should be“What is a regular expression?” A regular expression is a language pattern. grep and REs allow us to ﬁnd complex things in text. Complex is relative and can vary from a single character to an IP address. Single character complex: [ajk+0-] IP complex: (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3} (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) 3 / 16 Colloquium - grep, v1.0 A. Magee
6. 6. Introduction What? What does grep oﬀer? grep matches regular expressions. Your ﬁrst question should be“What is a regular expression?” A regular expression is a language pattern. grep and REs allow us to ﬁnd complex things in text. Complex is relative and can vary from a single character to an IP address. Single character complex: [ajk+0-] IP complex: (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3} (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) 3 / 16 Colloquium - grep, v1.0 A. Magee
7. 7. Introduction What? What does grep oﬀer? grep matches regular expressions. Your ﬁrst question should be“What is a regular expression?” A regular expression is a language pattern. grep and REs allow us to ﬁnd complex things in text. Complex is relative and can vary from a single character to an IP address. Single character complex: [ajk+0-] IP complex: (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3} (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) 3 / 16 Colloquium - grep, v1.0 A. Magee
8. 8. Introduction When? When should I use grep? Always! Unless you ﬁnd some better tool. P.S. - grep stands for g/re/p, an ed command that means global/reg ex/print 4 / 16 Colloquium - grep, v1.0 A. Magee
9. 9. Regular Expressions Class Basics Class Basics A character class is a symbol or collection of symbols that describes a group of characters. . (period): This matches any single character. [...]: This matches any one character in the set. [aeiou] matches one of the vowels. [a-z] matches one of the lowercase alphabet. [0-5] matches one numeral 0 through 5. You will not remember all of these until you use them often, but there are many special classes that can save you some typing. 5 / 16 Colloquium - grep, v1.0 A. Magee
10. 10. Regular Expressions Class Basics Common Classes Special Class Meaning Simple RE d Digit characters [0-9] D Non-digit characters [ˆ0-9] w Word characters [a-zA-Z 0-9] W Non-word characters [ˆa-zA-Z 0-9] s Whitespace characters characters [fnrt] S Non-space characters [ˆfnrt] b Word boundary The word boundary class is very special as it is zero length and matches transitions between s and w and vice versa. 6 / 16 Colloquium - grep, v1.0 A. Magee
11. 11. Regular Expressions Class Basics More Common Classes Special Class Meaning Simple RE [:alpha:] All alphabetic characters [a-zA-Z] [:alnum:] All alphabetic and numeric [a-zA-Z0-9] [:blank:] Tab and space [:cntrl:] Control characters [x00-x1Fx7F] [:digit:] A numeric digit [0-9] [:graph:] Any visible character [x21-x7E] [:lower:] Lowercase characters [a-z] [:print:] Printables (i.e. no controls) [x20-x7E] [:punct:] Punctuation & symbols [!”#\$%&’()*+,-./:;<=>? @[ ]ˆ ‘{|}∼] [:space:] Space, tab, newline, etc [ trnvf] [:upper:] Uppercase characters [A-Z] [:word:] Word characters [a-zA-Z0-9 ] [:xdigit:] Hex digits [A-Fa-f0-9] 7 / 16 Colloquium - grep, v1.0 A. Magee
12. 12. Regular Expressions Quantiﬁers & Grouping Quantiﬁers & Grouping Quantiﬁers are how a RE counts things. ? Exactly zero or one occurrence * Zero or more occurrences + One or more occurrences *? Zero or more occurrences non-greedy +? One or more occurrences non-greedy {x} Exactly x occurrences {x,} At least x occurrences {x,y} At least x but no more than y occurrences Grouping is used to collect patterns together and to create back-references. A group is simply a set of parentheses (). 8 / 16 Colloquium - grep, v1.0 A. Magee
13. 13. Regular Expressions Online Tools Helpful Tools The best way to understand the rest of this presentation is to see what is being matched live. Here are some online tools that work for our needs. RegExr - www.gskinner.com/RegExr beware Flash, but it works well regexpal - regexpal.com very simple reanimator - osteele.com/tools/reanimator beware Flash, recommend CS 4/570 ﬁrst rubular - rubular.com nice on-page reference 9 / 16 Colloquium - grep, v1.0 A. Magee
14. 14. Regular Expressions Examples Your First RE Let’s skip trivial REs and get on to something useful. These may be more complex than you’re used to but the quicker you are able to read long, complex REs the better. This is a nice, but not perfect, email address matcher. [[:alnum:]][[:word:].%+-]*@(?:[[:alnum:]-]+.)+[[:alpha:]]{2,4} [[:alnum:]][[:word:].%+-]* Match a word that doesn’t start with [.%+-]. @(?:[[:alnum:]-]+.)+ Match the @ symbol and any number of subdomains followed by periods. [[:alpha:]]{2,4} Match the top level domain of 2, 3 or 4 characters. 10 / 16 Colloquium - grep, v1.0 A. Magee
15. 15. Regular Expressions Examples Your First RE - Part 2 Let’s examine the ﬁrst part. [[:alnum:]][[:word:].%+-]* [[:alnum:]] - Must start with an alphanumeric character. NB: All [: ... :] classes must live in a set like [[: ... :]]. [[:word:].%+-] - Other characters maybe a ‘word’ character, a literal space, percent symbol, plus symbol or a dash. NB: The period must be escaped because it has special meaning. * - repeat the previous set zero or more times. 11 / 16 Colloquium - grep, v1.0 A. Magee
16. 16. Regular Expressions Examples Your First RE - Part 2 Let’s examine the ﬁrst part. [[:alnum:]][[:word:].%+-]* [[:alnum:]] - Must start with an alphanumeric character. NB: All [: ... :] classes must live in a set like [[: ... :]]. [[:word:].%+-] - Other characters maybe a ‘word’ character, a literal space, percent symbol, plus symbol or a dash. NB: The period must be escaped because it has special meaning. * - repeat the previous set zero or more times. 11 / 16 Colloquium - grep, v1.0 A. Magee
17. 17. Regular Expressions Examples Your First RE - Part 2 Let’s examine the ﬁrst part. [[:alnum:]][[:word:].%+-]* [[:alnum:]] - Must start with an alphanumeric character. NB: All [: ... :] classes must live in a set like [[: ... :]]. [[:word:].%+-] - Other characters maybe a ‘word’ character, a literal space, percent symbol, plus symbol or a dash. NB: The period must be escaped because it has special meaning. * - repeat the previous set zero or more times. 11 / 16 Colloquium - grep, v1.0 A. Magee
18. 18. Regular Expressions Examples Your First RE - Part 3 Now the second part, the subdomains, sub-subdomains, etc. @(?:[[:alnum:]-]+.)+ @ - Well that literally matches the ‘at’ character. The parenthesis denote the beginning of a group. The ?: is a confusing notation that suppresses the creation of a back reference. It is here so you’ll know of it, but it is rarely needed. Again we see a special class for alphanumerics, but we’ve also included a dash. The plus symbol tells us to look for one or more of these characters, followed by a period. And lastly we close the group and the plus symbol now tells us to look for one or more of these groups. 12 / 16 Colloquium - grep, v1.0 A. Magee
19. 19. Regular Expressions Examples Your First RE - Part 3 Now the second part, the subdomains, sub-subdomains, etc. @(?:[[:alnum:]-]+.)+ @ - Well that literally matches the ‘at’ character. The parenthesis denote the beginning of a group. The ?: is a confusing notation that suppresses the creation of a back reference. It is here so you’ll know of it, but it is rarely needed. Again we see a special class for alphanumerics, but we’ve also included a dash. The plus symbol tells us to look for one or more of these characters, followed by a period. And lastly we close the group and the plus symbol now tells us to look for one or more of these groups. 12 / 16 Colloquium - grep, v1.0 A. Magee
20. 20. Regular Expressions Examples Your First RE - Part 3 Now the second part, the subdomains, sub-subdomains, etc. @(?:[[:alnum:]-]+.)+ @ - Well that literally matches the ‘at’ character. The parenthesis denote the beginning of a group. The ?: is a confusing notation that suppresses the creation of a back reference. It is here so you’ll know of it, but it is rarely needed. Again we see a special class for alphanumerics, but we’ve also included a dash. The plus symbol tells us to look for one or more of these characters, followed by a period. And lastly we close the group and the plus symbol now tells us to look for one or more of these groups. 12 / 16 Colloquium - grep, v1.0 A. Magee
21. 21. Regular Expressions Examples Your First RE - Part 3 Now the second part, the subdomains, sub-subdomains, etc. @(?:[[:alnum:]-]+.)+ @ - Well that literally matches the ‘at’ character. The parenthesis denote the beginning of a group. The ?: is a confusing notation that suppresses the creation of a back reference. It is here so you’ll know of it, but it is rarely needed. Again we see a special class for alphanumerics, but we’ve also included a dash. The plus symbol tells us to look for one or more of these characters, followed by a period. And lastly we close the group and the plus symbol now tells us to look for one or more of these groups. 12 / 16 Colloquium - grep, v1.0 A. Magee
22. 22. Regular Expressions Examples Your First RE - Part 4 Finally the third part, the domain. [[:alpha:]]{2,4} We’ll now this part is easy. Just match 2, 3 or 4 alphabetical characters. 13 / 16 Colloquium - grep, v1.0 A. Magee
23. 23. Regular Expressions Examples Your Second RE Now we’ll look at a RE that can help use build a header ﬁle for a c program ﬁle, given that some neglectful programmer has failed to design his/her c program properly. This will be a quicker example. ˆ[ws]*([ws*&,]*)s*{ ˆ[ws]*( At the beginning of a line match some keywords and types and the function name and then literal parenthesis. [ws*&,]* Match some more words, keywords, variable modiﬁers and commas. )s*{ Finally match the closing parenthesis, some whitespace and the left curly brace, denoting the start of the function body. 14 / 16 Colloquium - grep, v1.0 A. Magee
24. 24. Regular Expressions Examples Your Second RE - Fine Details ˆ[ws]*([ws*&,]*)s*{ In general, most RE parsers will not match across multiple lines, even though the s class matches the newline character. This is very bothersome but is easily overcome by using pcregrep. pcre is Perl Compatible Regular Expression. This is all I will ever say about Perl. Notice that the literal * must be escaped like so, *. As must the parentheses due to their special RE meaning. Escaping so many characters is very annoying, but unfortunately it is necessary. 15 / 16 Colloquium - grep, v1.0 A. Magee
25. 25. Appendix 4 Appendix 16 / 16 Colloquium - grep, v1.0 A. Magee