Regex Intro


Published on

Published in: Technology, Lifestyle
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Regex Intro

  1. 1. ^[Rr]egular [Ee]xpressions$ Introduction
  2. 2. Vocabulary <ul><li>Regular expression / Regex / Regexp </li></ul><ul><ul><li>Regex is pronounced Reg (as in register) Ex (as in FedEx) </li></ul></ul><ul><li>Matching </li></ul><ul><ul><li>Regex matches a string means it matches in a string </li></ul></ul>
  3. 3. Regular Expressions <ul><li>Composed of two types of characters </li></ul><ul><ul><li>Metacharacters / Special characters </li></ul></ul><ul><ul><ul><li>* ? ^ $ . [ ] </li></ul></ul></ul><ul><ul><li>Literal characters </li></ul></ul><ul><ul><ul><li>a b c d </li></ul></ul></ul>
  4. 4. Egrep tool <ul><li>Allows you to use Regular Expressions to find words that match </li></ul><ul><li>Available for Macs, PCs and Linux </li></ul><ul><li>cat /usr/share/dict/words | egrep ‘…’ </li></ul><ul><li>See if you don’t have it preinstalled </li></ul>
  5. 5. My first regex <ul><li>cat /usr/share/dict/words | egrep ‘cat’ </li></ul><ul><ul><li>Matches any words with a ‘c’ followed by an ‘a’ followed by a ‘t’ </li></ul></ul><ul><ul><ul><li>bobcat </li></ul></ul></ul><ul><ul><ul><li>cat </li></ul></ul></ul><ul><ul><ul><li>catwalk </li></ul></ul></ul><ul><ul><ul><li>scatter </li></ul></ul></ul><ul><li>Simple regex, only uses Literal chars </li></ul>
  6. 6. Metacharacters: ^ and $ <ul><li>^ matches the beginning of a line </li></ul><ul><li>$ matches the end of a line </li></ul><ul><ul><li>^cat (start of line followed by ‘c’ then ‘a’ then ‘t’) </li></ul></ul><ul><ul><ul><li>cat </li></ul></ul></ul><ul><ul><ul><li>catwalk </li></ul></ul></ul><ul><ul><li>cat$ (‘c’ followed by ‘a’ then ‘t’ followed by EOL) </li></ul></ul><ul><ul><ul><li>bobcat </li></ul></ul></ul><ul><ul><ul><li>cat </li></ul></ul></ul><ul><ul><li>^cat$ (start of line followed by ‘c’ then ‘a’ then ‘t’ then EOL) </li></ul></ul><ul><ul><ul><li>cat </li></ul></ul></ul>
  7. 7. How to read regex <ul><li>Read each character one at a time </li></ul><ul><li>^bat </li></ul><ul><ul><li>Start of line followed by ‘b’ then ‘a’ then ‘t’ </li></ul></ul><ul><li>rat$ </li></ul><ul><ul><li>‘ r’ then ‘a’ then ‘t’ followed by end of line </li></ul></ul><ul><li>^dog$ </li></ul><ul><ul><li>Start of line followed by ‘d’ then ‘o’ then ‘g’ then EOL </li></ul></ul>
  8. 8. More simple regex’s <ul><li>^ </li></ul><ul><ul><li>Start of line </li></ul></ul><ul><li>^$ </li></ul><ul><ul><li>Start of line followed by end of line </li></ul></ul><ul><li>$ </li></ul><ul><ul><li>End of line </li></ul></ul><ul><li>^foot$ </li></ul><ul><ul><li>Start of line followed by ‘f’ then ‘o’ then ‘o’ then ‘t’ followed by EOL </li></ul></ul>
  9. 9. Character Classes [ ] <ul><li>Matches one of the characters in the [ ] </li></ul><ul><ul><li>[ae] </li></ul></ul><ul><ul><ul><li>Matches ‘a’ or ‘e’ </li></ul></ul></ul><ul><ul><li>[aeiouy] </li></ul></ul><ul><ul><ul><li>Matches any vowel </li></ul></ul></ul><ul><ul><li>^gr[ae]y$ </li></ul></ul><ul><ul><ul><li>Start of line followed by ‘g’ then ‘r’ then ‘a’ or ‘e’ then ‘y’ followed by end of line </li></ul></ul></ul><ul><ul><ul><li>grey or gray </li></ul></ul></ul>
  10. 10. Character Classes cont. <ul><li>[Ss] </li></ul><ul><ul><li>Matches upper or lower case ‘S’ </li></ul></ul><ul><li>[123456] </li></ul><ul><ul><li>Matches any of the digits listed </li></ul></ul><ul><li>[Hh][123456] </li></ul><ul><ul><li>Matches H1, h2, h3, H4, etc </li></ul></ul>
  11. 11. Special characters in [ ]’s <ul><li>- (dash) references a range </li></ul><ul><ul><li>[1-6] is the same as [123456] </li></ul></ul><ul><ul><li>[a-f] is the same as [abcdef] </li></ul></ul><ul><li>Ranges can be mixed with literals </li></ul><ul><ul><li>[0-9a-fA-F_!.?] </li></ul></ul><ul><ul><ul><li>Any digit, upper or lower case ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, underscore, exclamation, period or question mark </li></ul></ul></ul>
  12. 12. Negated character class [^ ] <ul><li>^ inside of [ ] means “not any of these” </li></ul><ul><ul><li>[^1-6] </li></ul></ul><ul><ul><ul><li>Any character other than 1, 2, 3, 4, 5, 6 </li></ul></ul></ul><ul><ul><li>[^a-fA-F] </li></ul></ul><ul><ul><ul><li>Any character other than A-F (upper or lower) </li></ul></ul></ul><ul><ul><li>The ^ must be the first character inside [ ] </li></ul></ul><ul><ul><ul><li>[^c] (Matches anything but ‘c’) </li></ul></ul></ul><ul><ul><ul><li>[c^] (Matches a ‘c’ or ‘^’) </li></ul></ul></ul>
  13. 13. Translating regex practice <ul><li>List of words that have ‘q’ followed by a character other than ‘u’ </li></ul><ul><ul><li>q[^u] </li></ul></ul><ul><li>List of words with ‘f’ followed by an ‘i’ or ‘o’ followed by ‘r’ then ‘e’ </li></ul><ul><ul><li>f[io]re </li></ul></ul><ul><li>Line starts with ‘Qu’ or ‘qu’ followed by an ‘e’ followed by any letter between ‘p’ and ‘t’ </li></ul><ul><ul><li>^[Qq]ue[p-t] </li></ul></ul>
  14. 14. Metacharacter: . (dot) <ul><li>Matches any character </li></ul><ul><li>c.t </li></ul><ul><ul><li>‘ c’ followed by any character followed by ‘t’ </li></ul></ul><ul><ul><ul><li>cat </li></ul></ul></ul><ul><ul><ul><li>cot </li></ul></ul></ul><ul><ul><ul><li>c8t </li></ul></ul></ul><ul><li>Period inside of [ ]’s matches a period </li></ul><ul><ul><li>[a.c] </li></ul></ul><ul><ul><ul><li>Matches ‘a’, ‘.’ or ‘c’ </li></ul></ul></ul>
  15. 15. Periods cont. <ul><li>03.19.76 </li></ul><ul><ul><li>Matches ‘03’ followed by a char then ‘19’ then any char then ‘76’ </li></ul></ul><ul><ul><ul><li>03-19-76 </li></ul></ul></ul><ul><ul><ul><li>03/19/76 </li></ul></ul></ul><ul><ul><ul><li>03.19.76 </li></ul></ul></ul><ul><ul><ul><li>03 19 76 </li></ul></ul></ul><ul><ul><ul><li>03 3 19 8 76 </li></ul></ul></ul>
  16. 16. Alternatives: | (pipe) <ul><li>Pipes allow you to specify alternatives </li></ul><ul><li>grey|gray </li></ul><ul><ul><li>Matches on grey or gray </li></ul></ul><ul><li>Use parentheses to constrain alternatives </li></ul><ul><ul><li>gr(e|a)y </li></ul></ul><ul><li>Within [ ]’s, | is a normal character </li></ul><ul><ul><li>[a|b] </li></ul></ul><ul><ul><ul><li>Matches ‘a’ or ‘|’ or ‘b’ </li></ul></ul></ul>
  17. 17. Pipes (cont.) <ul><li>Use parenthesis to constrain </li></ul><ul><ul><li>gre|ay </li></ul></ul><ul><ul><ul><li>matches ‘gre’ or ‘ay’ </li></ul></ul></ul><ul><ul><li>gr(e|a)y </li></ul></ul><ul><ul><ul><li>matches ‘gr’ followed by ‘e’ or ‘a’ then ‘y’ </li></ul></ul></ul>
  18. 18. Regex practice <ul><li>Match “First Street” or “1st street” </li></ul><ul><ul><li>(First|1st) [Ss]treet </li></ul></ul><ul><ul><li>(Fir|1)st [Ss]treet </li></ul></ul><ul><ul><ul><li>These are equivalent, which is better? </li></ul></ul></ul><ul><li>Match “toothbrush” or “hairbrush” </li></ul><ul><ul><li>(tooth|hair)brush </li></ul></ul>
  19. 19. ^ or $ and alternation <ul><li>Be careful when using ^ or $ with alternation </li></ul><ul><li>^From|Subject|Date: </li></ul><ul><ul><li>Start of line followed by From OR </li></ul></ul><ul><ul><li>Subject OR </li></ul></ul><ul><ul><li>Date: </li></ul></ul><ul><li>^(From|Subject|Date): </li></ul><ul><ul><li>Start of line followed by ‘From’ or ‘Subject’ or ‘Date’ followed by ‘:’ </li></ul></ul><ul><li>Safer to use ()’s to group your alternates </li></ul>
  20. 20. Case insensitive match <ul><li>Matches are case sensitive by default </li></ul><ul><ul><li>[Ff]rom will match From but not FRom </li></ul></ul><ul><li>Use egrep’s -i option to do a case insensitive match </li></ul><ul><li>Most languages have a case insensitive match as well </li></ul>
  21. 21. Quantifiers: ? <ul><li>? metacharacter means optional </li></ul><ul><ul><li>colou?r </li></ul></ul><ul><ul><ul><li>matches color or colour </li></ul></ul></ul><ul><ul><ul><li>‘ c’ then ‘o’ then ‘l’ then ‘o’ then optionally ‘u’ then ‘r’ </li></ul></ul></ul><ul><li>Match July or Jul and fourth, 4th and 4 </li></ul><ul><ul><li>(July|Jul) (fourth|4th|4) </li></ul></ul><ul><ul><li>July? (fourth|4th|4) </li></ul></ul><ul><ul><li>July? (fourth|4(th)?) </li></ul></ul>
  22. 22. Quantifiers: + and * <ul><li>+ (plus) </li></ul><ul><ul><li>One or more of the previous item </li></ul></ul><ul><li>* (star) </li></ul><ul><ul><li>Zero or more of the previous item </li></ul></ul><ul><li>b[0-9]*a </li></ul><ul><ul><li>ba </li></ul></ul><ul><ul><li>b9999a </li></ul></ul><ul><ul><li>b999999999999999a </li></ul></ul>
  23. 23. Summary of Quantifiers Minimum Required Maximum to try Meaning ? none 1 zero or one occurrence * none no limit zero or more occurrences + 1 no limit one or more occurrences
  24. 24. Escaping metacharacters <ul><li>Use (backslash) to escape metacharacters </li></ul><ul><ul><li>. matches ‘.’ </li></ul></ul><ul><ul><li>. matches any character </li></ul></ul><ul><li>c.t matches cat </li></ul><ul><li>c.t does not match cat </li></ul><ul><li>(cat) matches ‘(cat)’ not ‘cat’ </li></ul>
  25. 25. More practice <ul><li>Match chat, cat, chart </li></ul><ul><ul><li>ch?ar?t </li></ul></ul><ul><ul><li>c[h]?a[r]?t </li></ul></ul><ul><li>Start of line then M then one or more ‘a’ followed by ‘st’ and zero or more ‘b’ </li></ul><ul><ul><li>^M[a]+st[b]* </li></ul></ul><ul><li>Lines ending with one or more ‘c’ followed by a ‘t’ then zero or one ‘e’ </li></ul><ul><ul><li>[c]+t[e]*$ </li></ul></ul>
  26. 26. More practice <ul><li>^[Mm][^a-np-z]ney$ </li></ul><ul><ul><li>Start of line then ‘M’ or ‘m’ then any character not a-n and p-z then ‘ney’ followed by end of line </li></ul></ul><ul><ul><li>Money, money, m3ney </li></ul></ul><ul><li>^be.*(bob|ted)$ </li></ul><ul><ul><li>Start of line followed by ‘be’ followed by zero or more characters followed by ‘bob’ or ‘ted’ followed by end of line </li></ul></ul>
  27. 27. More practice <ul><li>Match truck, firetruck but not dumptruck </li></ul><ul><ul><li>^(fire)?truck$ </li></ul></ul><ul><li>$0.99, $599.95, $1000.45, $5000 </li></ul><ul><ul><li>$[0-9]+(.[0-9][0-9])?$ </li></ul></ul><ul><li>404-555-1212, 404.555.1212, (404) 555-1212 </li></ul><ul><ul><li>^[()0-9]+.[0-9]+.[0-9]+$ </li></ul></ul>