Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Introduction toRegular Expressions<br />Matt Casto<br />http://google.com/profiles/mattcasto<br />
Introduction toRegular Expressions<br />Matt Casto<br />Quick Solutions<br />http://google.com/profiles/mattcasto<br />
“Some people, when confronted with a problem, think “I know, I&apos;ll use regular expressions.      Now they have two pro...
What are Regular Expressions?<br />^w+@[a-zA-Z_]+?.[a-zA-Z]{2,3}$<br />[w-]+@([w-]+.)+[w-]+<br />^.+@[^.].*.[a-z]{2,}$<br ...
History<br />Stephen Cole Kleene<br />American mathematician credited for inventing Regular Expressions in the 1950’s usin...
History<br />Ken Thompson<br />American pioneer of computer science who, among many other things, used Kleene’s regular se...
History<br />grep<br />Global Regular Expression Print<br />
History<br />Henry Spencer<br />Wrote the regex library which is what Perl and Tcl languages used for regular expressions....
Why Should You Care?<br />Example:  finding duplicate words in a file.<br />Requirements:<br /><ul><li> Output lines that ...
 Find doubled words that expand lines
 Ignore capitalization differences
 Ignore HTML tags</li></li></ul><li>
Why Should You Care?<br />Example:  finding duplicate words in a file.<br />Solution:<br />$/ = “.
”;<br />while (&lt;&gt;...
Literal Characters<br />Any character except a small list of reserved characters.<br />regex<br />is<br />Jack is a boy<br...
Literal Characters<br />Literals will match characters in the middle of words.<br />regex<br />a<br />Jack is a boy<br />m...
Literal Characters<br />Literals are case sensitive – capitalization matters!<br />regex<br />j<br />Jack is a boy<br />NO...
Special Characters<br />[  ^ $ . | ? * + ( )<br />
Special Characters<br />You can match special characters by escaping them with a backslash.<br />1+1=2<br />I wrote 1+1=2 ...
Special Characters<br />Some characters, such as { and } are only reserved depending on context.<br />if (true) {<br />els...
Non-Printable Characters<br />Some literal characters can be escaped to represent non-printable characters.<br />	 – tab<b...
Period<br />The period character matches any single character.<br />a.boy<br />Jack is a boy<br />
Character Classes<br />Used to match only one of the characters inside square braces.<br />[Gg]r[ae]y<br />Grayson drives ...
Character Classes<br />Hyphen is a reserved character inside a character class and indicates a range.<br />[0-9a-fA-F]<br ...
Character Classes<br />Caret inside a character class negates the match.<br />q[^u]<br />Qatar is home to quite a lot of I...
Character Classes<br />Normal special characters are valid inside of character classes. Only ]  ^ and – are reserved.<br /...
Shorthand Character Classes<br />d – digit or [0-9]<br />w – word or [A-Za-z0-9_]<br />s – whitespace or [ 	
] (space, ta...
Shorthand Character Classes<br />D – non-digit or [^d]<br />W – non-word or [^w]<br />S – non-whitespace or [^s]<br />[D]<...
Repetition<br />The asterisk repeats the preceding character class 0 or more times.<br />&lt;[A-Za-z][A-Za-z0-9]*&gt;<br /...
Repetition<br />The plus repeats the preceding character class 1 or more times.<br />&lt;[A-Za-z0-9]+&gt;<br />Watch out f...
Repetition<br />The question mark repeats the preceding character class 0 or 1 times, in effect making it optional.<br />&...
Anchors<br />The caret anchor matches the position before the first character in a string.<br />^vac<br />vacation evacuat...
Anchors<br />The dollar sign anchor matches the position after the last character in a string.<br />tion$<br />vacation ev...
Anchors<br />The caret and dollar sign anchors match the start and end of the line if the engine has multi-line turned on....
Anchors<br />The A and  shorthand character classes are like<br />^ and $ but only match the start and end of the string....
Word Boundaries<br />The  shorthand character class matches…<br /><ul><li> position before the first character in a strin...
 position after the last character in a string (like $)
 between two characters where one is a word character and the other is not</li></ul>4<br />We’ve got 4 orders for 44 lbs...
Word Boundaries<br />The B shorthand character class is the negated word boundary – any position between to word character...
Alternation<br />The pipe symbol delimits two or more character classes that can both match.<br />cat|dog<br />A cat and d...
Alternation<br />Alternations include any character classes.<br />cat|dog<br />A cat and dog are expected to follow<br /...
Alternation<br />Use parenthesis to group alternating matches when you want to limit the reach of alternation.<br />(cat|...
Eagerness<br />Eagerness causes the order of alternations to matter.<br />and|android<br />A robot and an android fight. T...
Greediness<br />Greediness means that the engine will always try to match as much as possible.<br />anS+<br />A robot and ...
Laziness<br />Laziness, or reluctant, modifies a repetition operator to only match as much as it needs to.<br />anS+?<br /...
Limiting Repetition<br />You can limit repetition with curly braces.<br />d{2,4}<br />1 111111111 11111<br />
Limiting Repetition<br />The second number can be omitted to mean infinite.<br />Essentially {0,} is the same as * and {1,...
Limiting Repetition<br />The a single number can be used to match an exact number of times.<br />d{4}<br />1 11 111 1111 1...
Back References<br />Parenthesis around a character set groups those characters and creates a back reference.<br />([ai])....
Named Groups<br />Named groups let you reference matched groups by their name rather than just index.<br />(?&lt;vowel&gt;...
Upcoming SlideShare
Loading in …5
×

Introduction to Regular Expressions

10,527 views

Published on

^Regular Expressions is one of those tools that every developer should have in their toolbox. You can do your job without regular expressions, but knowing when and how to use them will make you a much more efficient and marketable developer. You'll learn how regular expressions can be used for validating user input, parsing text, and refactoring code. We'll also cover various tools that can be used to help you write and share expressions.$

Published in: Technology, News & Politics
  • DOWNLOAD FULL. BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Very interesting. Thanks.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Introduction to Regular Expressions

  1. 1. Introduction toRegular Expressions<br />Matt Casto<br />http://google.com/profiles/mattcasto<br />
  2. 2. Introduction toRegular Expressions<br />Matt Casto<br />Quick Solutions<br />http://google.com/profiles/mattcasto<br />
  3. 3. “Some people, when confronted with a problem, think “I know, I&apos;ll use regular expressions. Now they have two problems.”<br />- Jamie Zawinski, August 12, 1997<br />
  4. 4.
  5. 5. What are Regular Expressions?<br />^w+@[a-zA-Z_]+?.[a-zA-Z]{2,3}$<br />[w-]+@([w-]+.)+[w-]+<br />^.+@[^.].*.[a-z]{2,}$<br />^([a-zA-Z0-9_-.]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([a-zA-Z0-9-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)$<br />
  6. 6.
  7. 7. History<br />Stephen Cole Kleene<br />American mathematician credited for inventing Regular Expressions in the 1950’s using a mathematic notation called regular sets.<br />
  8. 8. History<br />Ken Thompson<br />American pioneer of computer science who, among many other things, used Kleene’s regular sets for searching in his QED and ed text editors.<br />
  9. 9. History<br />grep<br />Global Regular Expression Print<br />
  10. 10. History<br />Henry Spencer<br />Wrote the regex library which is what Perl and Tcl languages used for regular expressions.<br />
  11. 11. Why Should You Care?<br />Example: finding duplicate words in a file.<br />Requirements:<br /><ul><li> Output lines that contain duplicate words
  12. 12. Find doubled words that expand lines
  13. 13. Ignore capitalization differences
  14. 14. Ignore HTML tags</li></li></ul><li>
  15. 15. Why Should You Care?<br />Example: finding duplicate words in a file.<br />Solution:<br />$/ = “. ”;<br />while (&lt;&gt;) {<br /> next if !s/([a-z]+)((?:s&lt;[^&gt;]+&gt;)+)(1)/e[7m$1e[m$2e[7m$3e[m/ig;<br /> s/^(?:[^e]* )+//mg;<br /> s/^/$ARGV: /mg;<br /> print;<br />}<br />
  16. 16.
  17. 17. Literal Characters<br />Any character except a small list of reserved characters.<br />regex<br />is<br />Jack is a boy<br />match in target string<br />
  18. 18. Literal Characters<br />Literals will match characters in the middle of words.<br />regex<br />a<br />Jack is a boy<br />matches in target string<br />
  19. 19. Literal Characters<br />Literals are case sensitive – capitalization matters!<br />regex<br />j<br />Jack is a boy<br />NOT a match<br />
  20. 20. Special Characters<br />[ ^ $ . | ? * + ( )<br />
  21. 21. Special Characters<br />You can match special characters by escaping them with a backslash.<br />1+1=2<br />I wrote 1+1=2 on the chalkboard.<br />
  22. 22. Special Characters<br />Some characters, such as { and } are only reserved depending on context.<br />if (true) {<br />else if (true) { beep; }<br />
  23. 23. Non-Printable Characters<br />Some literal characters can be escaped to represent non-printable characters.<br /> – tab<br /> – carriage return<br /> – line feed<br />a – bell<br />e – escape<br />f – form feed<br />v – vertical tab<br />
  24. 24. Period<br />The period character matches any single character.<br />a.boy<br />Jack is a boy<br />
  25. 25. Character Classes<br />Used to match only one of the characters inside square braces.<br />[Gg]r[ae]y<br />Grayson drives a grey sedan.<br />
  26. 26. Character Classes<br />Hyphen is a reserved character inside a character class and indicates a range.<br />[0-9a-fA-F]<br />The HTML codefor White is #FFFFFF<br />
  27. 27. Character Classes<br />Caret inside a character class negates the match.<br />q[^u]<br />Qatar is home to quite a lot of Iraqi citizens, but is not a city in Iraq<br />
  28. 28. Character Classes<br />Normal special characters are valid inside of character classes. Only ] ^ and – are reserved.<br />[+*]<br />6 * 7 and 18 + 24 both equal 42<br />
  29. 29. Shorthand Character Classes<br />d – digit or [0-9]<br />w – word or [A-Za-z0-9_]<br />s – whitespace or [ ] (space, tab, CR, LF)<br />[sd]<br />1 + 2 = 3<br />
  30. 30. Shorthand Character Classes<br />D – non-digit or [^d]<br />W – non-word or [^w]<br />S – non-whitespace or [^s]<br />[D]<br />1 + 2 = 3<br />
  31. 31. Repetition<br />The asterisk repeats the preceding character class 0 or more times.<br />&lt;[A-Za-z][A-Za-z0-9]*&gt;<br />&lt;HTML&gt;Regex is &lt;b&gt;Awesome&lt;/b&gt;&lt;/HTML&gt;<br />
  32. 32. Repetition<br />The plus repeats the preceding character class 1 or more times.<br />&lt;[A-Za-z0-9]+&gt;<br />Watch out for invalid &lt;HTML&gt; tags like &lt;1&gt; and &lt;&gt;!<br />
  33. 33. Repetition<br />The question mark repeats the preceding character class 0 or 1 times, in effect making it optional.<br />&lt;/?[A-Za-z][A-Za-z0-9]*&gt;<br />&lt;HTML&gt;Regex is &lt;b&gt;Awesome&lt;/b&gt;&lt;/HTML&gt;<br />
  34. 34. Anchors<br />The caret anchor matches the position before the first character in a string.<br />^vac<br />vacation evacuation<br />
  35. 35. Anchors<br />The dollar sign anchor matches the position after the last character in a string.<br />tion$<br />vacation evacuation<br />
  36. 36. Anchors<br />The caret and dollar sign anchors match the start and end of the line if the engine has multi-line turned on.<br />tion$<br />vacation evacuation<br />has ruined my evaluation<br />
  37. 37. Anchors<br />The A and  shorthand character classes are like<br />^ and $ but only match the start and end of the string.<br />tion<br />vacation evacuation<br />has ruined my evaluation<br />
  38. 38. Word Boundaries<br />The  shorthand character class matches…<br /><ul><li> position before the first character in a string (like ^)
  39. 39. position after the last character in a string (like $)
  40. 40. between two characters where one is a word character and the other is not</li></ul>4<br />We’ve got 4 orders for 44 lbs of C4<br />
  41. 41. Word Boundaries<br />The B shorthand character class is the negated word boundary – any position between to word characters or two non-word characters.<br />BatB<br />vacation evacuation at that<br />time ate my evaluation<br />
  42. 42. Alternation<br />The pipe symbol delimits two or more character classes that can both match.<br />cat|dog<br />A cat and dog are expected to follow<br />the dogma that their presence with one<br />another leads to catastrophe.<br />
  43. 43. Alternation<br />Alternations include any character classes.<br />cat|dog<br />A cat and dog are expected to follow<br />the dogma that their presence with one<br />another leads to catastrophe.<br />
  44. 44. Alternation<br />Use parenthesis to group alternating matches when you want to limit the reach of alternation.<br />(cat|dog)<br />A cat and dog are expected to follow<br />the dogma that their presence with one<br />another leads to catastrophe.<br />
  45. 45. Eagerness<br />Eagerness causes the order of alternations to matter.<br />and|android<br />A robot and an android fight. The ninja wins.<br />
  46. 46. Greediness<br />Greediness means that the engine will always try to match as much as possible.<br />anS+<br />A robot and an android fight. The ninja wins.<br />
  47. 47. Laziness<br />Laziness, or reluctant, modifies a repetition operator to only match as much as it needs to.<br />anS+?<br />A robot and an android fight. The ninja wins.<br />
  48. 48. Limiting Repetition<br />You can limit repetition with curly braces.<br />d{2,4}<br />1 111111111 11111<br />
  49. 49. Limiting Repetition<br />The second number can be omitted to mean infinite.<br />Essentially {0,} is the same as * and {1,} same as +.<br />d{2,}<br />1 11111111111111<br />
  50. 50. Limiting Repetition<br />The a single number can be used to match an exact number of times.<br />d{4}<br />1 11 111 1111 11111<br />
  51. 51. Back References<br />Parenthesis around a character set groups those characters and creates a back reference.<br />([ai]).1.1<br />The magician said abracadabra!<br />
  52. 52. Named Groups<br />Named groups let you reference matched groups by their name rather than just index.<br />(?&lt;vowel&gt;[ai]).k&lt;vowel&gt;.1<br />The magician said abracadabra!<br />
  53. 53. Negative Lookahead<br />Negative lookaheads match something that is not there.<br />q(?!u)<br />Qatar is home to quite a lot of Iraqi citizens, but is not a city in Iraq<br />
  54. 54. Positive Lookahead<br />Positive lookaheads match something that is there without having that group included in the match.<br />q(?=u)<br />Qatar is home to quite a lot of Iraqi citizens, but is not a city in Iraq<br />
  55. 55. Positive & Negative Lookbehind<br />Lookbehinds are just like lookaheads, but working backwards.<br />(?&lt;=a)q<br />Qatar is home to quite a lot of Iraqi citizens, but is not a city in Iraq<br />
  56. 56. Resources<br />Lots of web pages<br />http://del.icio.us/mattcasto/regex<br />“Mastering Regular Expressions”<br /> by Jeffrey Friedl<br />http://oreilly.com/catalog/9780596528126/<br />

×