Your SlideShare is downloading. ×
Advanced Regular Expressions Redux
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Advanced Regular Expressions Redux

1,447

Published on

Brief RE refresher with some more advanced topics - non-greedy quantifiers, character properties, nested group ordering, recursive expressions

Brief RE refresher with some more advanced topics - non-greedy quantifiers, character properties, nested group ordering, recursive expressions

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,447
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
100
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide






  • escaping???
  • escaping???
  • escaping???
  • examples!
    possessive (?+, *+, ++)
  • examples!
    possessive (?+, *+, ++)
  • examples!
    possessive (?+, *+, ++)
  • examples!
    possessive (?+, *+, ++)
  • examples!
    possessive (?+, *+, ++)
  • examples!
    possessive (?+, *+, ++)








  • unicode compat table!
  • unicode compat table!
  • unicode compat table!
  • unicode compat table!
  • unicode compat table!
  • unicode compat table!
  • unicode compat table!
  • notice the space at the end, capital reverses
  • notice the space at the end, capital reverses
  • notice the space at the end, capital reverses
  • notice the space at the end, capital reverses
  • notice the space at the end, capital reverses












  • how about /g??
  • how about /g??
  • how about /g??




















  • Transcript

    • 1. Regular Expressions Redux
    • 2. Scope • medium to advanced • 30 minutes • performance / backtracking irrelevant • no compatibility charts (yet)
    • 3. TOC • basic matching, quantifiers • character classes, types, properties, anchors • groups, options, replace string • look-ahead/behind • subexpressions
    • 4. RE overview
    • 5. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
    • 6. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
    • 7. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
    • 8. Quantifiers
    • 9. Quantifiers • classic greedy: ?, *, +
    • 10. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5}
    • 11. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1}
    • 12. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,}
    • 13. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,} • + == {1,}
    • 14. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,} • + == {1,} • non-greedy: ??, *?, +?, {5,7}?
    • 15. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
    • 16. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
    • 17. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
    • 18. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
    • 19. Character Classes / Properties
    • 20. Character Classes / Properties • [0-9a-z] (classes)
    • 21. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr.
    • 22. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-]
    • 23. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z]
    • 24. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties)
    • 25. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties) • works great on Unicode text (Latin,Katakana)
    • 26. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties) • works great on Unicode text (Latin,Katakana) • [:alnum:], [:^space:] (POSIX bracket)
    • 27. Character Types
    • 28. Character Types • . == anything (apart from newline)
    • 29. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode
    • 30. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode
    • 31. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode • d == digit == [0-9] • h == hexadecimal digit == [0-9a-fA-F]
    • 32. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode • d == digit == [0-9] • h == hexadecimal digit == [0-9a-fA-F] • SWD == [^s][^w][^d]
    • 33. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /b[w&&[^aA]]+b/ /W{2,}w+b/
    • 34. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /b[w&&[^aA]]+b/ /W{2,}w+b/
    • 35. Anchors
    • 36. Anchors • ^ - begining (line, string)
    • 37. Anchors • ^ - begining (line, string) • $ - end (line, string)
    • 38. Anchors • ^ - begining (line, string) • $ - end (line, string) • b - word boundary ~ wW (almost) • b.{5}b != Ww{5}W
    • 39. Anchors • ^ - begining (line, string) • $ - end (line, string) • b - word boundary ~ wW (almost) • b.{5}b != Ww{5}W • zero width!
    • 40. Options
    • 41. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global
    • 42. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global • can be written inline • (?imsx-imsx) • (?imsx-imsx:...)
    • 43. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global (?x-i) #this is cool • can be written inline ( foo #my important value • | #don't forget the alternative (?imsx-imsx) bar • ) # result equals to (foo|bar) (?imsx-imsx:...)
    • 44. Groups/Replacing
    • 45. Groups/Replacing • (...) - matched group
    • 46. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended)
    • 47. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended) • nested groups ordered by left bracket
    • 48. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended) • nested groups ordered by left bracket • (?:...) - non-captured group • useful for (?:foo)+ or (?:foo|bar)
    • 49. Example quot;foobarmanquot;.replace( /(?:f)((o)+)(bar)|(baz|man)/g, '$1, $2, $3, $4, $5')
    • 50. Example quot;foobarmanquot;.replace( /(?:f)((o)+)(bar)|(baz|man)/g, '$1, $2, $3, $4, $5') • foobar • 1 -- oo • 2 -- o • 3 -- bar • 4 --
    • 51. Example quot;foobarmanquot;.replace( /(?:f)((o)+)(bar)|(baz|man)/g, '$1, $2, $3, $4, $5') • foobar • man • • 1 -- oo 1 -- • • 2 -- o 2 -- • • 3 -- bar 3 -- • • 4 -- 4 -- man
    • 52. Look-ahead/behind • defines custom zero-width anchors
    • 53. Look-ahead/behind • defines custom zero-width anchors positive negative ahead (?=...) (?!...) behind (?<=...) (?<!...)
    • 54. Example zdenek@gooddata.com /.*?@gooddata/ zdenek@gooddata.com /.*?(?=@gooddata)/
    • 55. Recursive RE • very important! • quote & bracket matching • technically not part of regular grammar • two styles • g<name> or g<n> - TextMate • (?R) - Perl
    • 56. Example (?x: ( # match the initial opening parenthesis # Now make a named group 'balanced' which # matches a balanced substring. (?<balanced> [^()] # A balanced substring is either something # that is not a parenthesis: | # …or a parenthesised string: ( # A parenthesised string begins with an opening parenthesis g<balanced>* # …followed by a sequence of balanced substrings ) # …and ends with a closing parenthesis )* # Look for a sequence of balanced substrings ) # Finally, the outer closing parenthesis )
    • 57. Example (?x: ( # match the initial opening parenthesis # Now make a named group 'balanced' which # matches a balanced substring. (?<balanced> [^()] # A balanced substring is either something # that is not a parenthesis: | # …or a parenthesised string: ( # A parenthesised string begins with an opening parenthesis g<balanced>* # …followed by a sequence of balanced substrings ) # …and ends with a closing parenthesis )* # Look for a sequence of balanced substrings ) # Finally, the outer closing parenthesis ) or: (([^()]|(?R))*)

    ×