• Save
Regular expressions 101
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Regular expressions 101

on

  • 263 views

Regular expressions are under-valued and most developers tend to only know the basics. Having a thorough understanding of how regular expressions work, will be incredibly helpful when you need to ...

Regular expressions are under-valued and most developers tend to only know the basics. Having a thorough understanding of how regular expressions work, will be incredibly helpful when you need to parse structured data.

This presentation will assume you already know what regular expressions are, but will sum up (with an example) some fancy things you probably didn’t know were possible with regular expressions.

If you're interested in a more detailed write-up, I suggest you check out http://www.mullie.eu/regular-expressions-basics/ & http://www.mullie.eu/regular-expressions-advanced/

This presentation is based on the PHP-implementation of PCRE, but nearly all programming languages support the same functionality, albeit sometimes with their own twists.

Statistics

Views

Total Views
263
Views on SlideShare
263
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Regular expressions 101 Presentation Transcript

  • 1. REGEX 101 The Swiss Army knife of string manipulation
  • 2. @matthiasmullie Regular expressions 101
  • 3. INTRODUCTION What are regular expressions? Regular expressions 101
  • 4. Regular expressions are special characters that match or capture portions of a field, as well as the rules that govern all characters. Google Regular expressions 101 » Introduction
  • 5. A regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Wikipedia Regular expressions 101 » Introduction
  • 6. /{$([a-z0-9_]*)((.[a-z0-9_]*)*)
 (->[a-z0-9_]*((.[a-z0-9_]*)*))?
 ((|[a-z_][a-z0-9_]*(:.*?)*)*)}/i Regular expressions 101 » Introduction
  • 7. Regular expressions find patterns in strings. Me Regular expressions 101 » Introduction
  • 8. Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit... ! ! ‣ /ipsum/ ‣ /[a-z]/i ‣ /(est|qui)/ ‣ /[^w]/i Regular expressions 101 » Introduction
  • 9. BASICS The syntax everyone should know already Regular expressions 101
  • 10. /Delimiter/ ‣ ‣ Any [^a-zA-Z0-9s] character Opening char == terminating char ‣ Except for [ ], ( ), { } and < > Regular expressions 101 » Delimiter
  • 11. Use / (uniformity, you know) Regular expressions 101 » Delimiter
  • 12. Meta characters ‣ . ‣ ( ) ‣ [ ] ‣ ‣ ^ ‣ * ‣ | ‣ {n} $ ? Regular expressions 101 » Meta characters + {n,m}
  • 13. Pattern modifiers //x ‣ i ‣ A ‣ m ‣ D ‣ s ‣ U ‣ x ‣ J ‣ e ‣ ... Regular expressions 101 » Pattern modifiers
  • 14. Character classes [ ] Ranges Inverse ranges ‣ [0-9] ‣ [^0-9] ‣ [a-zA-Z] ‣ [^a-zA-Z] ‣ [A-F0-9] ‣ [^A-F0-9] Regular expressions 101 » Character classes
  • 15. Character classes [ ] No sequence of characters! ! ‣ l, o, r, e or m ‣ [lorem] lorem Regular expressions 101 » Character classes
  • 16. Character classes [ ] POSIX ‣ [:alnum:] ‣ [:blank:] ‣ [:lower:] ‣ ... Regular expressions 101 » Character classes
  • 17. Greediness: greedy <ul><li>list-item1</li><li>list-item2</li></ul> ! /<li>.*</li>/ ‣ <li>list-item1</li><li>list-item2</li> Regular expressions 101 » Greediness
  • 18. Greediness: lazy <ul><li>list-item1</li><li>list-item2</li></ul> ! /<li>.*?</li>/ or ‣ <li>list-item1</li> ‣ /<li>.*</li>/U <li>list-item2</li> Regular expressions 101 » Greediness
  • 19. Subpatterns /([a-z0-9]*)@([a-z0-9.]*.[a-z0-9]{2,3})/i ! user email hostname Note: this regex only barely satisfies my needs for this particular example; do not use this really find occurrences of email addresses, it does not fully satisfy RFC5321 & RFC5322 Regular expressions 101 » Subpatterns
  • 20. Questions? Regular expressions 101
  • 21. ADVANCED The juicy stuff you never knew about, until now Regular expressions 101
  • 22. Back references Problem: /href=['"](.*?)['"]/i ! Matches: ! ‣ href="xxx" ‣ href="xxx' ‣ href='xxx' ‣ href='xxx" Regular expressions 101 » Back references
  • 23. Back references Solution: /href=(['"])(.*?)1/i 1 references first subpattern! ! Don’t forget to also string-escape in PHP: preg_match('/href=(['"])(.*?)1/i', ...); Regular expressions 101 » Back references
  • 24. Named subpatterns Scenario: parsing large CSV 1,a title,5.00,92,green 2,another title,3.50,4,blue 3,one more,33699.99,15,white ... Regular expressions 101 » Named subpatterns
  • 25. Named subpatterns /([0-9]+),(.*?),([0-9]+.[0-9]{2}),([0-9]+),([a-z]+)/i ! ! Result excerpt: [1] [2] [3] [4] [5] => => => => => string(1) string(7) string(4) string(2) string(5) ! ! ! ! Regular expressions 101 » Named subpatterns "1" "a title" "5.00" "92" "green"
  • 26. Named subpatterns /(?P<id>[0-9]+),(?P<title>.*?),(?P<price>[0-9]+.[0-9] {2}),(?P<stock>[0-9]+),(?P<color>[a-z]+)/i ! Result excerpt: ["id"] => string(1) "1" [1] => string(1) "1" ["title"] => string(7) "a title" [2] => string(7) "a title" ["price"] => string(4) "5.00" [3] => string(4) "5.00" ["stock"] => string(2) "92" [4] => string(2) "92" ["color"] => string(5) "green" [5] => string(5) "green" Regular expressions 101 » Named subpatterns
  • 27. Named subpatterns ‣ (?P<name>pattern) ‣ (?<name>pattern) & (?'name'pattern) since PHP 5.2.2 Regular expressions 101 » Named subpatterns
  • 28. Named subpatterns + back references ! /href=(?P<quotes>['"])(?P<href>.*?)(?P=quotes)/i Regular expressions 101 » Named subpatterns
  • 29. Lookahead/-behind assertions “Take a peek, don’t eat it” Regular expressions 101 » Assertions
  • 30. Lookahead/-behind assertions Scenario: find all occurrences of “here” ! “Where can I find here, not there?” Regular expressions 101 » Assertions
  • 31. Lookahead/-behind assertions Deduction: Find all here’s, not preceded or followed by an alphabetic character. ! Solution: /(?<![a-z])here(?![a-z])/i Regular expressions 101 » Assertions
  • 32. Lookahead/-behind assertions ‣ Positive lookahead: (?=expression) ‣ Negative lookahead: (?!expression) ‣ Positive lookbehind: (?<=expression) ‣ Negative lookbehind: (?<!expression) Regular expressions 101 » Assertions
  • 33. Lookahead/-behind assertions “lookbehind assertion is not fixed length...” In PHP, lookbehind can not contain repetition, while lookahead can. ‣ (?=.*) ‣ (?<=.*) ‣ (?=abc) ‣ (?<=abc) Regular expressions 101 » Assertions
  • 34. Conditional subpatterns if-then(-else) in regular expressions ! ! YES RLY! Regular expressions 101 » Conditional subpatterns
  • 35. Conditional subpatterns Scenario: match all (x|ht)ml tags ! Caution! ‣ <element></element> ‣ <element /> Regular expressions 101 » Conditional subpatterns
  • 36. Conditional subpatterns Solution: if then else /<(?P<tag>[a-z]+).*?(?P<self>/)?>(?(self)|.*?</(?P=tag)>)/i Named patterns If self-closing, then do nothing,
 else, find matching end tag Regular expressions 101 » Conditional subpatterns
  • 37. Conditional subpatterns With subpattern (named or by id): ‣ ‣ (?(pattern)then) ‣ (?(pattern)then|else) With lookahead/-behind: ‣ ‣ (?(?=assertion)then) ‣ (?(?=assertion)then|else) Regular expressions 101 » Conditional subpatterns
  • 38. Comments / # match currency symbols for USD, EUR, GBP & YEN [$€£¥] # must be followed by a number to indicate a price (?=[0-9]) # pattern modifiers: # u for UTF-8 interpretation (currency symbols), # x to ignore whitespace (for comments) /ux Regular expressions 101 » Comments
  • 39. Comments ‣ # Perl-style ‣ /x modifier ‣ Ignores unescaped whitespace Regular expressions 101 » Comments
  • 40. Presentation title
  • 41. Questions? Regular expressions 101
  • 42. Resources ‣ www.mullie.eu/regular-expressions-basics/ ‣ www.mullie.eu/regular-expressions-advanced/ mullie.eu Regular expressions 101