Upcoming SlideShare
×

# Regular expressions 101

485 views

Published on

Regular expressions are under-valued and most developers tend to only know the basics. Having a thorough understanding of how regular expressions work, will be incredibly helpful when you need to parse structured data.

This presentation will assume you already know what regular expressions are, but will sum up (with an example) some fancy things you probably didn’t know were possible with regular expressions.

If you're interested in a more detailed write-up, I suggest you check out http://www.mullie.eu/regular-expressions-basics/ & http://www.mullie.eu/regular-expressions-advanced/

This presentation is based on the PHP-implementation of PCRE, but nearly all programming languages support the same functionality, albeit sometimes with their own twists.

Published in: Education, Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

Views
Total views
485
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
0
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Regular expressions 101

1. 1. REGEX 101 The Swiss Army knife of string manipulation
2. 2. @matthiasmullie Regular expressions 101
3. 3. INTRODUCTION What are regular expressions? Regular expressions 101
4. 4. Regular expressions are special characters that match or capture portions of a field, as well as the rules that govern all characters. Google Regular expressions 101 » Introduction
5. 5. A regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Wikipedia Regular expressions 101 » Introduction
6. 6. /{\$([a-z0-9_]*)((.[a-z0-9_]*)*)  (->[a-z0-9_]*((.[a-z0-9_]*)*))?  ((|[a-z_][a-z0-9_]*(:.*?)*)*)}/i Regular expressions 101 » Introduction
7. 7. Regular expressions find patterns in strings. Me Regular expressions 101 » Introduction
8. 8. Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit... ! ! ‣ /ipsum/ ‣ /[a-z]/i ‣ /(est|qui)/ ‣ /[^w]/i Regular expressions 101 » Introduction
9. 9. BASICS The syntax everyone should know already Regular expressions 101
10. 10. /Delimiter/ ‣ ‣ Any [^a-zA-Z0-9s] character Opening char == terminating char ‣ Except for [ ], ( ), { } and < > Regular expressions 101 » Delimiter
11. 11. Use / (uniformity, you know) Regular expressions 101 » Delimiter
12. 12. Meta characters ‣ . ‣ ( ) ‣ [ ] ‣ ‣ ^ ‣ * ‣ | ‣ {n} \$ ? Regular expressions 101 » Meta characters + {n,m}
13. 13. Pattern modiﬁers //x ‣ i ‣ A ‣ m ‣ D ‣ s ‣ U ‣ x ‣ J ‣ e ‣ ... Regular expressions 101 » Pattern modiﬁers
14. 14. Character classes [ ] Ranges Inverse ranges ‣ [0-9] ‣ [^0-9] ‣ [a-zA-Z] ‣ [^a-zA-Z] ‣ [A-F0-9] ‣ [^A-F0-9] Regular expressions 101 » Character classes
15. 15. Character classes [ ] No sequence of characters! ! ‣ l, o, r, e or m ‣ [lorem] lorem Regular expressions 101 » Character classes
16. 16. Character classes [ ] POSIX ‣ [:alnum:] ‣ [:blank:] ‣ [:lower:] ‣ ... Regular expressions 101 » Character classes
17. 17. Greediness: greedy <ul><li>list-item1</li><li>list-item2</li></ul> ! /<li>.*</li>/ ‣ <li>list-item1</li><li>list-item2</li> Regular expressions 101 » Greediness
18. 18. Greediness: lazy <ul><li>list-item1</li><li>list-item2</li></ul> ! /<li>.*?</li>/ or ‣ <li>list-item1</li> ‣ /<li>.*</li>/U <li>list-item2</li> Regular expressions 101 » Greediness
19. 19. Subpatterns /([a-z0-9]*)@([a-z0-9.]*.[a-z0-9]{2,3})/i ! user email hostname Note: this regex only barely satisfies my needs for this particular example; do not use this really find occurrences of email addresses, it does not fully satisfy RFC5321 & RFC5322 Regular expressions 101 » Subpatterns
20. 20. Questions? Regular expressions 101
21. 21. ADVANCED The juicy stuff you never knew about, until now Regular expressions 101
22. 22. Back references Problem: /href=['"](.*?)['"]/i ! Matches: ! ‣ href="xxx" ‣ href="xxx' ‣ href='xxx' ‣ href='xxx" Regular expressions 101 » Back references
23. 23. Back references Solution: /href=(['"])(.*?)1/i 1 references first subpattern! ! Don’t forget to also string-escape in PHP: preg_match('/href=(['"])(.*?)1/i', ...); Regular expressions 101 » Back references
24. 24. Named subpatterns Scenario: parsing large CSV 1,a title,5.00,92,green 2,another title,3.50,4,blue 3,one more,33699.99,15,white ... Regular expressions 101 » Named subpatterns
25. 25. Named subpatterns /([0-9]+),(.*?),([0-9]+.[0-9]{2}),([0-9]+),([a-z]+)/i ! ! Result excerpt: [1] [2] [3] [4] [5] => => => => => string(1) string(7) string(4) string(2) string(5) ! ! ! ! Regular expressions 101 » Named subpatterns "1" "a title" "5.00" "92" "green"
26. 26. Named subpatterns /(?P<id>[0-9]+),(?P<title>.*?),(?P<price>[0-9]+.[0-9] {2}),(?P<stock>[0-9]+),(?P<color>[a-z]+)/i ! Result excerpt: ["id"] => string(1) "1" [1] => string(1) "1" ["title"] => string(7) "a title" [2] => string(7) "a title" ["price"] => string(4) "5.00" [3] => string(4) "5.00" ["stock"] => string(2) "92" [4] => string(2) "92" ["color"] => string(5) "green" [5] => string(5) "green" Regular expressions 101 » Named subpatterns
27. 27. Named subpatterns ‣ (?P<name>pattern) ‣ (?<name>pattern) & (?'name'pattern) since PHP 5.2.2 Regular expressions 101 » Named subpatterns
28. 28. Named subpatterns + back references ! /href=(?P<quotes>['"])(?P<href>.*?)(?P=quotes)/i Regular expressions 101 » Named subpatterns
29. 29. Lookahead/-behind assertions “Take a peek, don’t eat it” Regular expressions 101 » Assertions
30. 30. Lookahead/-behind assertions Scenario: ﬁnd all occurrences of “here” ! “Where can I ﬁnd here, not there?” Regular expressions 101 » Assertions
31. 31. Lookahead/-behind assertions Deduction: Find all here’s, not preceded or followed by an alphabetic character. ! Solution: /(?<![a-z])here(?![a-z])/i Regular expressions 101 » Assertions
32. 32. Lookahead/-behind assertions ‣ Positive lookahead: (?=expression) ‣ Negative lookahead: (?!expression) ‣ Positive lookbehind: (?<=expression) ‣ Negative lookbehind: (?<!expression) Regular expressions 101 » Assertions
33. 33. Lookahead/-behind assertions “lookbehind assertion is not fixed length...” In PHP, lookbehind can not contain repetition, while lookahead can. ‣ (?=.*) ‣ (?<=.*) ‣ (?=abc) ‣ (?<=abc) Regular expressions 101 » Assertions
34. 34. Conditional subpatterns if-then(-else) in regular expressions ! ! YES RLY! Regular expressions 101 » Conditional subpatterns
35. 35. Conditional subpatterns Scenario: match all (x|ht)ml tags ! Caution! ‣ <element></element> ‣ <element /> Regular expressions 101 » Conditional subpatterns
36. 36. Conditional subpatterns Solution: if then else /<(?P<tag>[a-z]+).*?(?P<self>/)?>(?(self)|.*?</(?P=tag)>)/i Named patterns If self-closing, then do nothing,  else, find matching end tag Regular expressions 101 » Conditional subpatterns
37. 37. Conditional subpatterns With subpattern (named or by id): ‣ ‣ (?(pattern)then) ‣ (?(pattern)then|else) With lookahead/-behind: ‣ ‣ (?(?=assertion)then) ‣ (?(?=assertion)then|else) Regular expressions 101 » Conditional subpatterns
38. 38. Comments / # match currency symbols for USD, EUR, GBP & YEN [\$€£¥] # must be followed by a number to indicate a price (?=[0-9]) # pattern modifiers: # u for UTF-8 interpretation (currency symbols), # x to ignore whitespace (for comments) /ux Regular expressions 101 » Comments
39. 39. Comments ‣ # Perl-style ‣ /x modifier ‣ Ignores unescaped whitespace Regular expressions 101 » Comments
40. 40. Presentation title
41. 41. Questions? Regular expressions 101
42. 42. Resources ‣ www.mullie.eu/regular-expressions-basics/ ‣ www.mullie.eu/regular-expressions-advanced/ mullie.eu Regular expressions 101