Regular Expression Basics <ul><li>PHPNW 2008 </li></ul><ul><li>Ciarán Walsh </li></ul>
What are regular expressions? <ul><li>Regular expressions allow matching and manipulation of textual data.  </li></ul><ul>...
Regular Expression Basics Literals bus Matches a ‘ b ’, followed by a ‘ u ’, followed by an ‘ s ’
Regular Expression Basics Anchors ^ Matches at the beginning of a line $ Matches at the end of a line
Regular Expression Basics Character Classes [abc] Matches one of ‘ a ’, ‘ b ’ or ‘ c ’ [a-c] Same as above (character rang...
Regular Expression Basics Alternation a|b Matches one of ‘ a ’ or ‘ b ’ dog|cat Matches one of “dog” or “cat”
Regular Expression Basics Quantifiers (repetition) {x,y} Matches minimum of  x  and a maximum of  y  occurrences; either c...
Regular Expression Basics Grouping (…) Groups the contents of the parentheses. Affects alternation and quantifiers. Allows...
Regular Expression Basics Delimiters pattern / modifiers / /i Makes match case-insensitive
Performing a Match <ul><li>Returns number of matches (0 or 1) </li></ul><ul><li>$matches will contain captured groups </li...
Performing a Replacement <ul><li>Returns string after replacement </li></ul><ul><li>Can use backreferences with �-9 </li><...
<ul><li>(?:(?:
)?[ 	])*(?:(?:(?:[^()<>@,;:amp;quot;.[] �00-�31]+(?:(?:(?:
)?[ 	])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|...
Testing for a Substring if  ( preg_match ( '/foo/' ,  $ var )) if  ( strpos ( $ var ,  'foo' )  !==   false ) if  ( preg_m...
Validating an Integer <ul><li>Intention is not immediately obvious </li></ul><ul><li>Not efficient </li></ul>if  ( preg_ma...
Validating an Integer <ul><li>Native C library (fast) </li></ul><ul><li>Makes the intention obvious </li></ul>ctype (Chara...
Validating an Integer <ul><li>Intention is fairly clear </li></ul><ul><li>Casting is safe practice </li></ul><ul><li>Any i...
HTML Parsing
Using Regular Expressions
Using Regular Expressions Postcodes /[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}/ IP Addresses @^(d{1,2})/(d{1,2})/(d{4})$@
Constructing Patterns <ul><li>Writing patterns is a balance between matching what you  do  want, against not matching what...
You don’t need to use  /…/ to denote a pattern! /…/ to denote a pattern! preg_match ( '/<b><s> .+ < / s> .+ < / b>/' ,  $ ...
Greediness $ html   =   <<< HTML <span> some text </span><span> some more text! </span> HTML ; preg_match ( &quot;@<span>(...
You can make your pattern readable! preg_match ( '`^(w+)://(?:(.+?):(.+?)@)?(.+?).(w+)$`' ,  $ s ,  $ matches ) preg_match...
Extracting Captures preg_match ( '`^ (?P<protocol>w+):// (?: (?P<user>.+?) : (?P<pass>.+?) @ )? (?P<host>.+?) .(?P<tld>w+)...
Variable Data if  ( preg_match ( &quot;!> $ value </(?:div|span)>!&quot; ,  $ text )) $ value   =   preg_quote ( $ value ,...
Performing Logic on Replacements preg_replace ( '/w + /e' ,  'strtoupper(&quot;�&quot;)' ,  'foo bar baz' ) <ul><li>functi...
Testing Tools <ul><li>RegexBuddy </li></ul><ul><li>Reggy </li></ul><ul><li>http://rubular.com </li></ul>
Any Questions?
Upcoming SlideShare
Loading in...5
×

Regex Basics

2,524

Published on

Ciarán Walsh's PHPNW08 slides:

In the right hands regular expressions can be a powerful tool, but it’s also far too easy for them to be used badly, or in the wrong situations.

This talk will kick off with a look at alternatives to regular expressions, for when the power of pattern matching is not required, and will also go over some cases when there are better alternatives available.
Then there will be a brief refresher on pattern syntax and some general tips and tricks to help when constructing regular expressions, before we go on to look at some situations where the use of pattern matching is a good fit, how to solve some common problems, and some common pitfalls when writing patterns.

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total Views
2,524
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
73
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide
  • Regex Basics

    1. 1. Regular Expression Basics <ul><li>PHPNW 2008 </li></ul><ul><li>Ciarán Walsh </li></ul>
    2. 2. What are regular expressions? <ul><li>Regular expressions allow matching and manipulation of textual data. </li></ul><ul><li>Abbreviated as regex or regexp , or alternatively just “patterns”. </li></ul>
    3. 3. Regular Expression Basics Literals bus Matches a ‘ b ’, followed by a ‘ u ’, followed by an ‘ s ’
    4. 4. Regular Expression Basics Anchors ^ Matches at the beginning of a line $ Matches at the end of a line
    5. 5. Regular Expression Basics Character Classes [abc] Matches one of ‘ a ’, ‘ b ’ or ‘ c ’ [a-c] Same as above (character range) [^abc] Matches one character that is not listed . Matches any single character
    6. 6. Regular Expression Basics Alternation a|b Matches one of ‘ a ’ or ‘ b ’ dog|cat Matches one of “dog” or “cat”
    7. 7. Regular Expression Basics Quantifiers (repetition) {x,y} Matches minimum of x and a maximum of y occurrences; either can be omitted * Matches zero or more occurrences (any amount). Same as {0,} + Matches one or more occurrences. Same as {1,} ? Matches zero or one occurrences. Same as {0,1}
    8. 8. Regular Expression Basics Grouping (…) Groups the contents of the parentheses. Affects alternation and quantifiers. Allows parts of the match to be captured for|backward “ for” or “backward” (for|back)ward “ forward” or “backward”
    9. 9. Regular Expression Basics Delimiters pattern / modifiers / /i Makes match case-insensitive
    10. 10. Performing a Match <ul><li>Returns number of matches (0 or 1) </li></ul><ul><li>$matches will contain captured groups </li></ul><ul><li>preg_match ( </li></ul><ul><ul><ul><ul><ul><li>'/Te(.)f?/i' , </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>'text' , </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>$ matches </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>); </li></ul></ul></ul></ul></ul>
    11. 11. Performing a Replacement <ul><li>Returns string after replacement </li></ul><ul><li>Can use backreferences with -9 </li></ul><ul><li>preg_replace ( </li></ul><ul><ul><ul><ul><ul><li>'/some(text)/' , </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>'1' , </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>$ text </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>) </li></ul></ul></ul></ul></ul>
    12. 12. <ul><li>(?:(?: )?[ ])*(?:(?:(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?: </li></ul><ul><li>(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] </li></ul><ul><li>]|)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: </li></ul><ul><li>)?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)*<(?:(?: )?[ ])*(?:@(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |) </li></ul><ul><li>*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00- </li></ul><ul><li>31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)* </li></ul><ul><li>](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;: </li></ul><ul><li>&quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot; </li></ul><ul><li>.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*)|(? </li></ul><ul><li>:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)*:(?:(?: )?[ ])*(?:(?:(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||( </li></ul><ul><li>?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ] </li></ul><ul><li>))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31] </li></ul><ul><li>+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: </li></ul><ul><li> )?[ ])*)*<(?:(?: )?[ ])*(?:@(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(? </li></ul><ul><li>:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: </li></ul><ul><li> )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(? </li></ul><ul><li>:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; </li></ul><ul><li>||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@ </li></ul><ul><li>,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*)(?:,s*(?:(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;. </li></ul><ul><li>[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ </li></ul><ul><li> ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ </li></ul><ul><li> ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)*<( </li></ul><ul><li>?:(?: )?[ ])*(?:@(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+| </li></ul><ul><li>|(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?: </li></ul><ul><li>.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+| </li></ul><ul><li>|(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ ]))*&quot;(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|&quot;(?:[^&quot; ||(?:(?: )?[ </li></ul><ul><li> ]))*&quot;(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:amp;quot;.[] 00- </li></ul><ul><li>031]+(?:(?:(?: )?[ ])+||(?=[[&quot;()<>@,;:amp;quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*))*)?;s*) </li></ul>Don’t Use Regular Expressions! Don’t Abuse Regular Expressions! Some people, when confronted with a problem, think “ I know, I'll use regular expressions.” Now they have two problems. — Jamie Zawinski
    13. 13. Testing for a Substring if ( preg_match ( '/foo/' , $ var )) if ( strpos ( $ var , 'foo' ) !== false ) if ( preg_match ( '/foo/i' , $ var )) if ( stripos ( $ var , 'foo' ) !== false )
    14. 14. Validating an Integer <ul><li>Intention is not immediately obvious </li></ul><ul><li>Not efficient </li></ul>if ( preg_match ( '/ ^ d +$ /' , $ value )) { // $value is a positive integer } Regular Expression
    15. 15. Validating an Integer <ul><li>Native C library (fast) </li></ul><ul><li>Makes the intention obvious </li></ul>ctype (Character Type) if ( ctype_digit ( $ value )) { // $value is a positive integer }
    16. 16. Validating an Integer <ul><li>Intention is fairly clear </li></ul><ul><li>Casting is safe practice </li></ul><ul><li>Any invalid values will result in zero </li></ul>$ casted_value = intval ( $ value ); if ( $ casted_value > 0 ) { // $casted_value is a positive (non-zero) integer } Casting
    17. 17. HTML Parsing
    18. 18. Using Regular Expressions
    19. 19. Using Regular Expressions Postcodes /[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}/ IP Addresses @^(d{1,2})/(d{1,2})/(d{4})$@
    20. 20. Constructing Patterns <ul><li>Writing patterns is a balance between matching what you do want, against not matching what you don’t want. </li></ul>
    21. 21. You don’t need to use /…/ to denote a pattern! /…/ to denote a pattern! preg_match ( '/<b><s> .+ < / s> .+ < / b>/' , $ html ) preg_match ( '@<b><s> .+ </s> .+ </b>@' , $ html )
    22. 22. Greediness $ html = <<< HTML <span> some text </span><span> some more text! </span> HTML ; preg_match ( &quot;@<span>(.+)</span>@&quot; , $ html , $ matches ); echo $ matches [ 0 ]; preg_match ( &quot;@<span>(.+?)</span>@&quot; , $ html , $ matches ); echo $ matches [ 0 ];
    23. 23. You can make your pattern readable! preg_match ( '`^(w+)://(?:(.+?):(.+?)@)?(.+?).(w+)$`' , $ s , $ matches ) preg_match ( '` ^ (w+):// # Protocol (?: (.+?) # Username : # : (.+?) # Password @ # @ )? # Username/password are optional (.+?) # Hostname .(w+) # Top-level domain $ `x' , $ s , $ matches );
    24. 24. Extracting Captures preg_match ( '`^ (?P<protocol>w+):// (?: (?P<user>.+?) : (?P<pass>.+?) @ )? (?P<host>.+?) .(?P<tld>w+) $`x' , $ s , $ matches ); Array(    [0] => http://foo:bar@baz.example.com     [protocol] => http    [1] => http    [user] => foo    [2] => foo    [pass] => bar    [3] => bar    [host] => baz.example    [4] => baz.example    [tld] => com    [5] => com) preg_match ( '`^ (?P<protocol>w+):// (?: (?P<user>.+?) : (?P<pass>.+?) @ )? (?P<host>.+?) .(?P<tld>w+) $`x' , $ s , $ matches );
    25. 25. Variable Data if ( preg_match ( &quot;!> $ value </(?:div|span)>!&quot; , $ text )) $ value = preg_quote ( $ value , '!' );
    26. 26. Performing Logic on Replacements preg_replace ( '/w + /e' , 'strtoupper(&quot;&quot;)' , 'foo bar baz' ) <ul><li>function upper_case_match ( $ matches ) { </li></ul><ul><li>return strtoupper ( $ matches [ 0 ]); </li></ul><ul><li>} </li></ul><ul><li>preg_replace_callback ( </li></ul><ul><ul><ul><ul><ul><li>'/w + /' , </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>'upper_case_match' , </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>'foo bar baz' </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>) </li></ul></ul></ul></ul></ul>
    27. 27. Testing Tools <ul><li>RegexBuddy </li></ul><ul><li>Reggy </li></ul><ul><li>http://rubular.com </li></ul>
    28. 28. Any Questions?
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×