Regular Expressions in PHP<br />/(?:dave@davidstockton.com)/<br />Front Range PHP User Group<br />David Stockton<br />
What is a regular expression?<br />A pattern used to describe a part of some text<br />“Regular” has some implications to ...
Regex Joke<br />A programmer says, “I have a problem that I can solve with regular expressions.”<br />Now, he has two prob...
How to use regex in PHP	<br />The preg_* functions<br />Perl compatible regular expressions.<br />Probably the most common...
How can we use regex in PHP?<br />preg_match( ) – Searches a subject for a match<br />preg_match_all( ) – Searches a subje...
How can we use regex in PHP?<br />preg_quote( ) – Quotes regular expression characters<br />preg_last_error( ) – Returns t...
How can we use regex in PHP?<br />Those are the function calls, and we’ll play with the later.<br />First, we need to lear...
Starting Pattern<br />/[A-Z0-9._+=]+@[A-Z0-9.-].[A-Z]{2,4}/i<br />This matches a series of letters, numbers, plus, dash, d...
Matching Email Addresses<br />What about james@smithsonian.museum?<br />What about freddie@wherecanI.travel?<br />Both of ...
The “real” email address regex<br />(?:(?:
)?[ 	])*(?:(?:(?:[^()<>@,;:quot;.[] �00-�31]+(?:(?:(?:
)?[ 	] )+||(?=[["()<>...
The “real” email address regex cont.<br />�31]+(?:(?:(?:
)?[ 	])+||(?=[["()<>@,;:quot;.[ ]]))|[([^[]
|)*](?:(?:
)?[ 	])...
So…  How do we write this?<br />Don’t.  Other much more simple patterns have been written and will match 99.9% of valid em...
So now the real learnin’…<br />Letters and numbers match…  letters and numbers<br />/a/ - Matches a string that contains a...
More learnin’<br />Match a word<br />/regex/  - Matches a string with the word “regex” in it<br />You can use a pipe chara...
Delimiters<br />The examples so far have started with / and ended with /.<br />These are delimiters and let the regex engi...
Character Matching Continued<br />You can match a selection of characters<br />/[Pp][Hh][Pp]/  - Matches PHP in any mixtur...
Character Selection Ranges<br />Ranges can be combined<br />/[A-Za-z0-9]/ - Matches an alphanumeric character<br />/[A-Fa-...
Special Characters<br />Dot (.) matches any character<br />/./<br />/../ - Matches any two characters<br />To match an act...
Character classes<br />d means [0-9]<br />D means non-digits  - [^0-9]<br />w means word characters - [A-Za-z0-9_]<br />W ...
Repeating Character Classes<br />Match two digits in a row<br />/dd/<br />/[0-9][0-9]/<br />/d{2}/<br />/[0-9]{2}/<br />Ma...
Repeating Character Classes cont.<br />* means match 0 or more<br />+ means match 1 or more<br />{x} where x is a number m...
More special characters<br />? Means the preceding selection is optional<br />Putting it together<br />Telephone Number<br...
Regex Anchors<br />Anchors allow you to specify a position, like before, after or in between characters<br />/^ab/ matches...
Word Boundaries<br /> means word boundaries<br />Before first character if first character is a word character<br />After...
Alternation<br />/cow|boy/ - Matches cow, or boy or cowboy or coward, etc<br />/(cow|boy)/ - Matches cow or boy but not ...
Greedy vs Lazy<br />By default, regular expressions are greedy…  <br />That is, they will match as much as they can<br />G...
Another tag matching solution<br />/<[^>]+>/<br />Literally match a less than character followed by one or more non-greate...
Capturing part of regex (backreference)<br />/__(construct|destruct)/<br />Backreference will contain either construct or ...
Backreference Continued…<br />Very useful when performing regex search and replace<br />preg_replace('/(?(d{3}))?[s-]?(d{3...
More backreferences…<br />Replace duplicated words that that have been inadvertently left in<br />
Non-capturing groups<br />Match an IPv4 address<br />/((?:d{1,3}.){3}d{1,3})/<br />We’re matching 1 to 3 digits followed b...
Pattern Modifiers	<br />Modifiers go after the last delimiter (now you know why there are delimiters) and affect how the r...
Pattern Modifiers Continued…<br />D – Anchor for the end of the string only, otherwise $ matches 
 characters<br />Allow u...
Named Capture Groups<br />Rather than get back a numbered array of matches, get back an associative array.<br />If you add...
Named Capture Groups cont…<br />Use (?P<named_group>pattern)<br />
Named Capture Groups cont…<br />Combined numbered and associative array<br />Capture group 0 is the wholepattern that is m...
Positive Look Ahead Matches<br />Look for a pattern follow by another pattern<br />/p(?=h)/ - Match  a “p” followed by an ...
Negative Look Ahead<br />Look for a pattern which is not followed by some other pattern<br />/p(?=!h)/ - pnot followed by ...
Look Aheads<br />Positive and negative look aheads do not capture anything. <br />They just determine if the pattern match...
Look behinds<br />Positive look behind<br />/(?<=oo)d/ - d which is preceded by oo<br />Matches “food”, “mood”, match only...
With great power…<br />Test your regular expressions before they go to production<br />It’s much                 easier to...
When to not use regex<br />Whenever they aren’t needed.<br />If you can use strstr or strpos or str_replace to do the job,...
Resources<br />http://regular-expressions.info<br />http://us2.php.net/manual/en/ref.pcre.php<br />Spider Man from http://...
Questions?<br />dave@frontrangephp.org<br />
Upcoming SlideShare
Loading in...5
×

Regular expressions and php

3,765

Published on

Learn beginner to intermediate level regular expressions with some examples in PHP

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,765
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
71
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Regular expressions and php

  1. 1. Regular Expressions in PHP<br />/(?:dave@davidstockton.com)/<br />Front Range PHP User Group<br />David Stockton<br />
  2. 2. What is a regular expression?<br />A pattern used to describe a part of some text<br />“Regular” has some implications to how it can be built, but that’s not really part of this presentation<br />Extremely powerful and useful<br />(And often abused)<br />
  3. 3. Regex Joke<br />A programmer says, “I have a problem that I can solve with regular expressions.”<br />Now, he has two problems…<br />
  4. 4. How to use regex in PHP <br />The preg_* functions<br />Perl compatible regular expressions.<br />Probably the most common regex syntax<br />The ereg_* functions<br />POSIX style regular expressions<br />I am not covering these functions.<br />Don’t use the ereg ones. They are deprecated in PHP 5.3.<br />
  5. 5. How can we use regex in PHP?<br />preg_match( ) – Searches a subject for a match<br />preg_match_all( ) – Searches a subject for all matches<br />preg_replace( ) – Searches a subject for a pattern and replaces it with something else<br />preg_split( ) – Split a string into an array based on a regex delimiter<br />preg_filter( ) – Identical to preg_replace except it returns only the matches<br />preg_replace_callback( ) – Like preg_replace, but replacement is defined in a callback<br />preg_grep( ) – Returns an array of array elements that match a pattern<br />
  6. 6. How can we use regex in PHP?<br />preg_quote( ) – Quotes regular expression characters<br />preg_last_error( ) – Returns the error code of the last PCRE (Perl Compatible Regular Expression) function execution<br />
  7. 7. How can we use regex in PHP?<br />Those are the function calls, and we’ll play with the later.<br />First, we need to learn how to create regex patterns since we need those for any function call.<br />
  8. 8. Starting Pattern<br />/[A-Z0-9._+=]+@[A-Z0-9.-].[A-Z]{2,4}/i<br />This matches a series of letters, numbers, plus, dash, dots, underscores and equals, followed by an “AT” (@) sign, followed by a series of letters, numbers, dots and dashes, followed by a dot, followed by 2 to 4 letters.<br />In other words… It matches an email address… Or rather some email addresses.<br />
  9. 9. Matching Email Addresses<br />What about james@smithsonian.museum?<br />What about freddie@wherecanI.travel?<br />Both of those are valid email addresses, but they fail because our patter only allows 2-4 character TLD parts for the email address.<br />How can we match all valid email addresses and only valid email addresses?<br />
  10. 10. The “real” email address regex<br />(?:(?: )?[ ])*(?:(?:(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ] )+||(?=[["()<>@,;:quot;.[]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:( ?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00- 31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)* ](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+ (?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?: (?: )?[ ])*))*|(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+| |(?=[["()<>@,;:quot;.[]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: ) ?[ ])*)*<(?:(?: )?[ ])*(?:@(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: r )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: ) ?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ] )*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])* )(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ] )+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*) *:(?:(?: )?[ ])*)?(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+ ||(?=[["()<>@,;:quot;.[]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31 ]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*]( ?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(? :(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(? : )?[ ])*))*>(?:(?: )?[ ])*)|(?:[^()<>@,;:quot;.[] 00-31]+(?:(? :(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|"(?:[^" ||(?:(?: )? [ ]))*"(?:(?: )?[ ])*)*:(?:(?: )?[ ])*(?:(?:(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|"(?:[^" | |(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<> @,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|" (?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*))*@(?:(?: )?[ ] )*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:".[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(? :[^()<>@,;:quot;.[] 00-<br />
  11. 11. The “real” email address regex cont.<br />31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[ ]]))|[([^[] |)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:quot;.[] 00- 31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|"(?:[^" ||( ?:(?: )?[ ]))*"(?:(?: )?[ ])*)*<(?:(?: )?[ ])*(?:@(?:[^()<>@,; :quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([ ^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot; .[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[ ] |)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:quot;. [] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] r|)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|"(?:[^" |.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@, ;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|"(? :[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*))*@(?:(?: )?[ ])* (?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;. []]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[ ^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[] ]))|[([^[] |)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*)(?:,s*( ?:(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:( ?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[ ["()<>@,;:quot;.[]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(? :.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+| |(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*|(?: [^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[ ]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*)*<(?:(?: ) ?[ ])*(?:@(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[" ()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: ) ?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<> @,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@, ;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ] )*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:".[]]))|[([^[] |)*](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)? (?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;. []]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[ "()<>@,;:quot;.[]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ]) *))*@(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ]) +||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?: .(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+| |(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*>(?:( ?: )?[ ])*))*)?;s*) <br />
  12. 12. So… How do we write this?<br />Don’t. Other much more simple patterns have been written and will match 99.9% of valid email addresses.<br />Use something like Zend_Validate_EmailAddress<br />
  13. 13. So now the real learnin’…<br />Letters and numbers match… letters and numbers<br />/a/ - Matches a string that contains an “a”<br />/7/ - Matches a string that contains a 7.<br />
  14. 14. More learnin’<br />Match a word<br />/regex/ - Matches a string with the word “regex” in it<br />You can use a pipe character to give a choice<br />/pizza|steak|cheeseburger/ - Matches a string with any of these foods<br />
  15. 15. Delimiters<br />The examples so far have started with / and ended with /.<br />These are delimiters and let the regex engine know where the pattern starts and ends.<br />You can choose another delimiter if you’d like or if it’s more convenient<br />Match namespace:<br />#/My/PHP/Namespace#<br />If I used “/” in that example, I’d need to escape each of the forward slashes to differentiate them from the delimiter<br />
  16. 16. Character Matching Continued<br />You can match a selection of characters<br />/[Pp][Hh][Pp]/ - Matches PHP in any mixture of upper and lowercase<br />Ranges can be defined<br />/[abcdefghijklmnopqrstuvwxyz]/ - Matches any lowercase alpha character<br />/[a-z]/ - Matches any lowercase alpha character<br />
  17. 17. Character Selection Ranges<br />Ranges can be combined<br />/[A-Za-z0-9]/ - Matches an alphanumeric character<br />/[A-Fa-f0-9]/ - Matches any hex character<br />Character Selection can be inversed<br />/[^0-9]/ - Matches any non-digit character<br />/[^ ]/ - Matches any non space character<br />/[.!@#$%^*]/ - Matches some punctuation<br />
  18. 18. Special Characters<br />Dot (.) matches any character<br />/./<br />/../ - Matches any two characters<br />To match an actual dot character, you must escape<br />/./ - Matches a single dot character<br />Unless it’s a character selection<br />/[.]/ - Matches a single dot character<br />
  19. 19. Character classes<br />d means [0-9]<br />D means non-digits - [^0-9]<br />w means word characters - [A-Za-z0-9_]<br />W means non word characters – [^A-Za-z0-9_]<br />s means a whitespace character [ ]<br />S means non white space characters<br />
  20. 20. Repeating Character Classes<br />Match two digits in a row<br />/dd/<br />/[0-9][0-9]/<br />/d{2}/<br />/[0-9]{2}/<br />Match at least one digit (but as many as it can)<br />/d+/<br />Match 0 to infinite digits<br />/d*/<br />
  21. 21. Repeating Character Classes cont.<br />* means match 0 or more<br />+ means match 1 or more<br />{x} where x is a number means match exactly x of the preceding selection<br />{x,} means match at least x<br />{x,y} means match between x and y<br />{,y} means match up to y<br />
  22. 22. More special characters<br />? Means the preceding selection is optional<br />Putting it together<br />Telephone Number<br />/(?(d{3}))?[s-]?(d{3})[s-]?(d{4})/<br />Matches 720-675-7471 or (720)675-7471 or (720) 675-7471 or 7206757471 or 720 675 7471<br />Find a misspelled word (and get great deals on EBay)<br />/la[bp]topcomputer[s]?/<br />
  23. 23. Regex Anchors<br />Anchors allow you to specify a position, like before, after or in between characters<br />/^ab/ matches abcdefg but not cab<br />Notice that it’s the caret character… It means start of the string in this context, but means the opposite of a character class inside the square brackets<br />/ab$/ matches cab but not abcdefg<br />/^[a-z]+$/ will match a string that consists only of lowercase characters<br />
  24. 24. Word Boundaries<br /> means word boundaries<br />Before first character if first character is a word character<br />After last character if last character is a word character<br />Between two characters if one is a word character and the other is not<br />/fish/ matches fish, but not fisherman or catfish.<br />/fish/ matches fish and catfish<br />
  25. 25. Alternation<br />/cow|boy/ - Matches cow, or boy or cowboy or coward, etc<br />/(cow|boy)/ - Matches cow or boy but not cowboy or coward<br />The above example also captures the matching word due to the parens. More on this later.<br />
  26. 26. Greedy vs Lazy<br />By default, regular expressions are greedy… <br />That is, they will match as much as they can<br />Grab a starting html tag:<br />/<.+>/ <br />Matches in bold: <h1>Welcome to FRPUG</h1><br />Not what we want<br />Make it lazy: /<.+?>/<br />Now it matches <h1>Welcome to FRPUG</h1><br />
  27. 27. Another tag matching solution<br />/<[^>]+>/<br />Literally match a less than character followed by one or more non-greater than characters followed by a greater than character<br />This way eliminates the need for the engine to backtrack (almost certainly faster than the last example). <br />
  28. 28. Capturing part of regex (backreference)<br />/__(construct|destruct)/<br />Backreference will contain either construct or destruct so you can use it later<br />/([a-z]+)1/<br />Matches groups of repeated characters that repeat an even number of times.<br />Matches aa but not a. Matches aaaaa<br />/([a-z]{3})1/<br />Matches words like booboo or bambam<br />
  29. 29. Backreference Continued…<br />Very useful when performing regex search and replace<br />preg_replace('/(?(d{3}))?[s-]?(d{3})[s-]?(d{4})/', '(1) 2-3', $phone)<br />The above example will take any phone number from the previous example and return it formatted in (xxx) xxx-xxxx format<br />
  30. 30. More backreferences…<br />Replace duplicated words that that have been inadvertently left in<br />
  31. 31. Non-capturing groups<br />Match an IPv4 address<br />/((?:d{1,3}.){3}d{1,3})/<br />We’re matching 1 to 3 digits followed by a dot 3 times. We don’t care (right now) about the octets, we just want to repeat the match, so ?: says to not capture the group.<br />
  32. 32. Pattern Modifiers <br />Modifiers go after the last delimiter (now you know why there are delimiters) and affect how the regex engine works<br />i – case insensitive matching (matches are case-sensitive by default)<br />m – multiline matching<br />s - dot matches all characters, including <br />x – ignore all whitespace characters except if escaped or in a character class<br />
  33. 33. Pattern Modifiers Continued…<br />D – Anchor for the end of the string only, otherwise $ matches characters<br />Allow username to be alphabetic only<br />/^[A-Za-z]$/ - This will match dave extra stuff<br />However, /^[A-Za-z]$/D will not match<br />U – Invert the meaning of the greediness. With this on by default matches are lazy and ? makes it greedy.<br />There are lots of other modifiers and you can see them at http://us2.php.net/manual/en/reference.pcre.pattern.modifiers.php<br />
  34. 34. Named Capture Groups<br />Rather than get back a numbered array of matches, get back an associative array.<br />If you add a new capture group, you don’t have to renumber where you use the capture group<br />
  35. 35. Named Capture Groups cont…<br />Use (?P<named_group>pattern)<br />
  36. 36. Named Capture Groups cont…<br />Combined numbered and associative array<br />Capture group 0 is the wholepattern that is matched.<br />If our string to match against was abcde720-675 7471foobar, $matches[0] will contain720-675 7471<br />
  37. 37. Positive Look Ahead Matches<br />Look for a pattern follow by another pattern<br />/p(?=h)/ - Match a “p” followed by an “h” but don’t include the “h”<br />
  38. 38. Negative Look Ahead<br />Look for a pattern which is not followed by some other pattern<br />/p(?=!h)/ - pnot followed by h.<br />
  39. 39. Look Aheads<br />Positive and negative look aheads do not capture anything. <br />They just determine if the pattern match is possible<br />They are zero-width<br />/p[^h]/ is not the same as /p(?!h)/<br />/ph/ is not the same as /p(?=h)/<br />
  40. 40. Look behinds<br />Positive look behind<br />/(?<=oo)d/ - d which is preceded by oo<br />Matches “food”, “mood”, match only contains the “d”<br />Negative look behind<br />/(?<!oo)d/ - d which is not preceded by oo<br />Matches “dude”, “crude”, and “d”<br />
  41. 41. With great power…<br />Test your regular expressions before they go to production<br />It’s much easier to get them wrong than to get themright if you don’t test<br />
  42. 42. When to not use regex<br />Whenever they aren’t needed.<br />If you can use strstr or strpos or str_replace to do the job, do that. They are much faster, much simpler and easier to do correctly.<br />However, if you cannot use those functions, regex may be your best bet.<br />Don’t use regex when you really need a parser<br />
  43. 43. Resources<br />http://regular-expressions.info<br />http://us2.php.net/manual/en/ref.pcre.php<br />Spider Man from http://www.onlineseats.com/<br />
  44. 44. Questions?<br />dave@frontrangephp.org<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×