Your SlideShare is downloading. ×
0
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Advanced Regular Expressions Redux
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Advanced Regular Expressions Redux

1,466

Published on

Brief RE refresher with some more advanced topics - non-greedy quantifiers, character properties, nested group ordering, recursive expressions

Brief RE refresher with some more advanced topics - non-greedy quantifiers, character properties, nested group ordering, recursive expressions

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,466
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
100
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide






  • escaping???
  • escaping???
  • escaping???
  • examples!
    possessive (?+, *+, ++)
  • examples!
    possessive (?+, *+, ++)
  • examples!
    possessive (?+, *+, ++)
  • examples!
    possessive (?+, *+, ++)
  • examples!
    possessive (?+, *+, ++)
  • examples!
    possessive (?+, *+, ++)








  • unicode compat table!
  • unicode compat table!
  • unicode compat table!
  • unicode compat table!
  • unicode compat table!
  • unicode compat table!
  • unicode compat table!
  • notice the space at the end, capital reverses
  • notice the space at the end, capital reverses
  • notice the space at the end, capital reverses
  • notice the space at the end, capital reverses
  • notice the space at the end, capital reverses












  • how about /g??
  • how about /g??
  • how about /g??




















  • Transcript

    • 1. Regular Expressions Redux
    • 2. Scope • medium to advanced • 30 minutes • performance / backtracking irrelevant • no compatibility charts (yet)
    • 3. TOC • basic matching, quantifiers • character classes, types, properties, anchors • groups, options, replace string • look-ahead/behind • subexpressions
    • 4. RE overview
    • 5. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
    • 6. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
    • 7. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
    • 8. Quantifiers
    • 9. Quantifiers • classic greedy: ?, *, +
    • 10. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5}
    • 11. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1}
    • 12. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,}
    • 13. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,} • + == {1,}
    • 14. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,} • + == {1,} • non-greedy: ??, *?, +?, {5,7}?
    • 15. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
    • 16. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
    • 17. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
    • 18. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
    • 19. Character Classes / Properties
    • 20. Character Classes / Properties • [0-9a-z] (classes)
    • 21. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr.
    • 22. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-]
    • 23. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z]
    • 24. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties)
    • 25. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties) • works great on Unicode text (Latin,Katakana)
    • 26. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties) • works great on Unicode text (Latin,Katakana) • [:alnum:], [:^space:] (POSIX bracket)
    • 27. Character Types
    • 28. Character Types • . == anything (apart from newline)
    • 29. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode
    • 30. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode
    • 31. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode • d == digit == [0-9] • h == hexadecimal digit == [0-9a-fA-F]
    • 32. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode • d == digit == [0-9] • h == hexadecimal digit == [0-9a-fA-F] • SWD == [^s][^w][^d]
    • 33. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /b[w&&[^aA]]+b/ /W{2,}w+b/
    • 34. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /b[w&&[^aA]]+b/ /W{2,}w+b/
    • 35. Anchors
    • 36. Anchors • ^ - begining (line, string)
    • 37. Anchors • ^ - begining (line, string) • $ - end (line, string)
    • 38. Anchors • ^ - begining (line, string) • $ - end (line, string) • b - word boundary ~ wW (almost) • b.{5}b != Ww{5}W
    • 39. Anchors • ^ - begining (line, string) • $ - end (line, string) • b - word boundary ~ wW (almost) • b.{5}b != Ww{5}W • zero width!
    • 40. Options
    • 41. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global
    • 42. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global • can be written inline • (?imsx-imsx) • (?imsx-imsx:...)
    • 43. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global (?x-i) #this is cool • can be written inline ( foo #my important value • | #don't forget the alternative (?imsx-imsx) bar • ) # result equals to (foo|bar) (?imsx-imsx:...)
    • 44. Groups/Replacing
    • 45. Groups/Replacing • (...) - matched group
    • 46. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended)
    • 47. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended) • nested groups ordered by left bracket
    • 48. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended) • nested groups ordered by left bracket • (?:...) - non-captured group • useful for (?:foo)+ or (?:foo|bar)
    • 49. Example quot;foobarmanquot;.replace( /(?:f)((o)+)(bar)|(baz|man)/g, '$1, $2, $3, $4, $5')
    • 50. Example quot;foobarmanquot;.replace( /(?:f)((o)+)(bar)|(baz|man)/g, '$1, $2, $3, $4, $5') • foobar • 1 -- oo • 2 -- o • 3 -- bar • 4 --
    • 51. Example quot;foobarmanquot;.replace( /(?:f)((o)+)(bar)|(baz|man)/g, '$1, $2, $3, $4, $5') • foobar • man • • 1 -- oo 1 -- • • 2 -- o 2 -- • • 3 -- bar 3 -- • • 4 -- 4 -- man
    • 52. Look-ahead/behind • defines custom zero-width anchors
    • 53. Look-ahead/behind • defines custom zero-width anchors positive negative ahead (?=...) (?!...) behind (?<=...) (?<!...)
    • 54. Example zdenek@gooddata.com /.*?@gooddata/ zdenek@gooddata.com /.*?(?=@gooddata)/
    • 55. Recursive RE • very important! • quote & bracket matching • technically not part of regular grammar • two styles • g<name> or g<n> - TextMate • (?R) - Perl
    • 56. Example (?x: ( # match the initial opening parenthesis # Now make a named group 'balanced' which # matches a balanced substring. (?<balanced> [^()] # A balanced substring is either something # that is not a parenthesis: | # …or a parenthesised string: ( # A parenthesised string begins with an opening parenthesis g<balanced>* # …followed by a sequence of balanced substrings ) # …and ends with a closing parenthesis )* # Look for a sequence of balanced substrings ) # Finally, the outer closing parenthesis )
    • 57. Example (?x: ( # match the initial opening parenthesis # Now make a named group 'balanced' which # matches a balanced substring. (?<balanced> [^()] # A balanced substring is either something # that is not a parenthesis: | # …or a parenthesised string: ( # A parenthesised string begins with an opening parenthesis g<balanced>* # …followed by a sequence of balanced substrings ) # …and ends with a closing parenthesis )* # Look for a sequence of balanced substrings ) # Finally, the outer closing parenthesis ) or: (([^()]|(?R))*)

    ×