Regex Basics

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

1 comments

Comments 1 - 1 of 1 previous next Post a comment

Post a comment
Embed Video
Edit your comment Cancel

3 Favorites & 1 Event

Regex Basics - Presentation Transcript

  1. Regular Expression Basics
    • PHPNW 2008
    • Ciarán Walsh
  2. What are regular expressions?
    • Regular expressions allow matching and manipulation of textual data.
    • Abbreviated as regex or regexp , or alternatively just “patterns”.
  3. Regular Expression Basics Literals bus Matches a ‘ b ’, followed by a ‘ u ’, followed by an ‘ s ’
  4. Regular Expression Basics Anchors ^ Matches at the beginning of a line $ Matches at the end of a line
  5. Regular Expression Basics Character Classes [abc] Matches one of ‘ a ’, ‘ b ’ or ‘ c ’ [a-c] Same as above (character range) [^abc] Matches one character that is not listed . Matches any single character
  6. Regular Expression Basics Alternation a|b Matches one of ‘ a ’ or ‘ b ’ dog|cat Matches one of “dog” or “cat”
  7. Regular Expression Basics Quantifiers (repetition) {x,y} Matches minimum of x and a maximum of y occurrences; either can be omitted * Matches zero or more occurrences (any amount). Same as {0,} + Matches one or more occurrences. Same as {1,} ? Matches zero or one occurrences. Same as {0,1}
  8. Regular Expression Basics Grouping (…) Groups the contents of the parentheses. Affects alternation and quantifiers. Allows parts of the match to be captured for|backward “ for” or “backward” (for|back)ward “ forward” or “backward”
  9. Regular Expression Basics Delimiters pattern / modifiers / /i Makes match case-insensitive
  10. Performing a Match
    • Returns number of matches (0 or 1)
    • $matches will contain captured groups
    • preg_match (
            • '/Te(.)f?/i' ,
            • 'text' ,
            • $ matches
            • );
  11. Performing a Replacement
    • Returns string after replacement
    • Can use backreferences with -9
    • preg_replace (
            • '/some(text)/' ,
            • '1' ,
            • $ text
            • )
    • (?:(?: )?[ ])*(?:(?:(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|&quot;(?:[^&quot; \]|\.|(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:
    • (?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|&quot;(?:[^&quot; \]|\.|(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \
    • ]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?:
    • )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|&quot;(?:[^&quot; \]|\.|(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)*<(?:(?: )?[ ])*(?:@(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)
    • *](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-
    • 31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*
    • ](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|&quot;(?:[^&quot; \]|\.|(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:
    • &quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|&quot;(?:[^&quot; \]|\.|(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;
    • .[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*)|(?
    • :[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|&quot;(?:[^&quot; \]|\.|(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)*:(?:(?: )?[ ])*(?:(?:(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(
    • ?=[[&quot;()<>@,;:\&quot;.[]]))|&quot;(?:[^&quot; \]|\.|(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|&quot;(?:[^&quot; \]|\.|(?:(?: )?[ ]
    • ))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]
    • +(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|&quot;(?:[^&quot; \]|\.|(?:(?: )?[ ]))*&quot;(?:(?:
    • )?[ ])*)*<(?:(?: )?[ ])*(?:@(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?
    • :(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?:
    • )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?
    • :(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|&quot;(?:[^&quot; \]|\.|(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|&quot;(?:[^&quot; \]
    • |\.|(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@
    • ,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*)(?:,s*(?:(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.
    • []]))|&quot;(?:[^&quot; \]|\.|(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|&quot;(?:[^&quot; \]|\.|(?:(?: )?[ ]))*&quot;(?:(?: )?[
    • ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[
    • ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|&quot;(?:[^&quot; \]|\.|(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)*<(
    • ?:(?: )?[ ])*(?:@(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+|
    • |(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:
    • .(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+|
    • |(?=[[&quot;()<>@,;:\&quot;.[]]))|&quot;(?:[^&quot; \]|\.|(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|&quot;(?:[^&quot; \]|\.|(?:(?: )?[
    • ]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:\&quot;.[] 00-
    • 031]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:\&quot;.[]]))|[([^[] \]|\.)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*))*)?;s*)
    Don’t Use Regular Expressions! Don’t Abuse Regular Expressions! Some people, when confronted with a problem, think “ I know, I'll use regular expressions.” Now they have two problems. — Jamie Zawinski
  12. Testing for a Substring if ( preg_match ( '/foo/' , $ var )) if ( strpos ( $ var , 'foo' ) !== false ) if ( preg_match ( '/foo/i' , $ var )) if ( stripos ( $ var , 'foo' ) !== false )
  13. Validating an Integer
    • Intention is not immediately obvious
    • Not efficient
    if ( preg_match ( '/ ^ d +$ /' , $ value )) { // $value is a positive integer } Regular Expression
  14. Validating an Integer
    • Native C library (fast)
    • Makes the intention obvious
    ctype (Character Type) if ( ctype_digit ( $ value )) { // $value is a positive integer }
  15. Validating an Integer
    • Intention is fairly clear
    • Casting is safe practice
    • Any invalid values will result in zero
    $ casted_value = intval ( $ value ); if ( $ casted_value > 0 ) { // $casted_value is a positive (non-zero) integer } Casting
  16. HTML Parsing
  17. Using Regular Expressions
  18. Using Regular Expressions Postcodes /[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}/ IP Addresses @^(d{1,2})/(d{1,2})/(d{4})$@
  19. Constructing Patterns
    • Writing patterns is a balance between matching what you do want, against not matching what you don’t want.
  20. You don’t need to use /…/ to denote a pattern! /…/ to denote a pattern! preg_match ( '/<b><s> .+ < / s> .+ < / b>/' , $ html ) preg_match ( '@<b><s> .+ </s> .+ </b>@' , $ html )
  21. Greediness $ html = <<< HTML <span> some text </span><span> some more text! </span> HTML ; preg_match ( &quot;@<span>(.+)</span>@&quot; , $ html , $ matches ); echo $ matches [ 0 ]; preg_match ( &quot;@<span>(.+?)</span>@&quot; , $ html , $ matches ); echo $ matches [ 0 ];
  22. You can make your pattern readable! preg_match ( '`^(w+)://(?:(.+?):(.+?)@)?(.+?).(w+)$`' , $ s , $ matches ) preg_match ( '` ^ (w+):// # Protocol (?: (.+?) # Username : # : (.+?) # Password @ # @ )? # Username/password are optional (.+?) # Hostname .(w+) # Top-level domain $ `x' , $ s , $ matches );
  23. Extracting Captures preg_match ( '`^ (?P<protocol>w+):// (?: (?P<user>.+?) : (?P<pass>.+?) @ )? (?P<host>.+?) .(?P<tld>w+) $`x' , $ s , $ matches ); Array(    [0] => http://foo:bar@baz.example.com     [protocol] => http    [1] => http    [user] => foo    [2] => foo    [pass] => bar    [3] => bar    [host] => baz.example    [4] => baz.example    [tld] => com    [5] => com) preg_match ( '`^ (?P<protocol>w+):// (?: (?P<user>.+?) : (?P<pass>.+?) @ )? (?P<host>.+?) .(?P<tld>w+) $`x' , $ s , $ matches );
  24. Variable Data if ( preg_match ( &quot;!> $ value </(?:div|span)>!&quot; , $ text )) $ value = preg_quote ( $ value , '!' );
  25. Performing Logic on Replacements preg_replace ( '/w + /e' , 'strtoupper(&quot;&quot;)' , 'foo bar baz' )
    • function upper_case_match ( $ matches ) {
    • return strtoupper ( $ matches [ 0 ]);
    • }
    • preg_replace_callback (
            • '/w + /' ,
            • 'upper_case_match' ,
            • 'foo bar baz'
            • )
  26. Testing Tools
    • RegexBuddy
    • Reggy
    • http://rubular.com
  27. Any Questions?

+ Jeremy CoatesJeremy Coates, 9 months ago

custom

686 views, 3 favs, 0 embeds more stats

Ciarán Walsh's PHPNW08 slides:

In the right han more

More info about this document

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Go to text version

  • Total Views 686
    • 686 on SlideShare
    • 0 from embeds
  • Comments 1
  • Favorites 3
  • Downloads 12
Most viewed embeds

more

All embeds

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories

Groups / Events