Regular Expressions 2007


Published on

Beginners guide to using Regular Expressions in PHP

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Regular Expressions 2007

    1. 1. Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn (
    2. 2. What are Regular Expressions <ul><li>Regular expressions are a syntax to match text. </li></ul><ul><li>They date back to mathematical notation made in the 1950s. </li></ul><ul><li>Became embedded in unix systems through tools like ed and grep. </li></ul>
    3. 3. What are RE <ul><li>Perl in particular promoted the use of very complex regular expressions. </li></ul><ul><li>They are now available in all popular programming languages. </li></ul><ul><li>They allow much more complex matching than strpos() </li></ul>
    4. 4. Why use RE <ul><li>You can use RE to enforce rules on formats like phone numbers, email addresses or URLs. </li></ul><ul><li>You can use them to find key data within logs, configuration files or webpages. </li></ul>
    5. 5. Why use RE <ul><li>They can quickly make replacements that may be complex like finding all email addresses in a page and making them address [AT] site [dot] com. </li></ul><ul><li>You can make your code really hard to understand </li></ul>
    6. 6. Syntax basics <ul><li>The entire regular expression is a sequence of characters between two forward slashes (/) </li></ul><ul><li>abc - most characters are normal character matches. This is looking for the exact character sequence a, b and then c </li></ul><ul><li>. - a period will match any character (except a newline but that can change) </li></ul><ul><li>[abc] - square brackets will match any of the characters inside. Here: a, b or c. </li></ul>
    7. 7. Syntax basics <ul><li>? - marks the previous as optional. so a? means there might be an a </li></ul><ul><li>(abc)* - parenthesis group patterns and the asterix marks zero or more of the previous character. So this would match an empty string or abcabcabcabc </li></ul><ul><li>.+ - the backslash is an all purpose escape character. the + marks one or more of the previous character. So this would match ...... </li></ul>
    8. 8. More syntax tricks <ul><li>[0-4] - match any number from 0 to 4 </li></ul><ul><li>[^0-4] - match anything not the number 0-4 </li></ul><ul><li>swords - match word where there is white space before and after </li></ul><ul><li>word -  marks a word boundary. This could be white space, new line or end of the string </li></ul>
    9. 9. More syntax tricks <ul><li>d{3,12} - d matches any digit ([0-9]) while the braces mark the min and max count of the previous character. In this case 3 to 12 digits </li></ul><ul><li>[a-z]{8,} - must be at least 8 letters </li></ul>
    10. 10. Matching Text <ul><li>Simple check: preg_match(“/^[a-z0-9]+@([a-z0-9]+.)*[a-z0-9]+$/i”, $email_address) > 0 </li></ul><ul><li>Finding: preg_match(“/colou?r:s+([a-zA-Z]+)/”, $text, $matches); echo $matches[1]; </li></ul><ul><li>Find all: preg_match_all(“/<([^>]+)>/”, $html, $tags); echo $tags[2][1]; </li></ul>
    11. 11. Matching Lines <ul><li>This is more for looking through files but could be for any array of text. </li></ul><ul><li>$new_lines = preg_grep(“/Jan[a-z]*[s/-](20)?07/”, $old_lines); </li></ul><ul><li>Or lines that do not match by adding a third parameter of PREG_GREP_INVERT rather than complicating your regular expression into something like /^[^/]|(/[^p])|(/p[^r]) etc... </li></ul>
    12. 12. Replacing text <ul><li>preg_replace( </li></ul><ul><li>“ /[^@]+(@)[a-zA-Z-_d]+(.)[a-zA-Z-_d.]+/”, </li></ul><ul><li>array(“ [AT] “, “ [dot] “), $post); </li></ul>
    13. 13. Splitting text <ul><li>$date_parts = preg_split(“/[-.,/]+/”, $date_string); </li></ul>
    14. 14. Tips <ul><li>Comment what your regular expression is doing. </li></ul><ul><li>Test your regular expression for speed. Some can cause a noticeable slowdown. </li></ul><ul><li>There are plenty of simple uses like /Width: (d+)/ </li></ul><ul><li>Watch out for greedy expressions. Eg /(<(.+)>)/ will not pull out “b” and “/b” from “<b>test</b>” but instead will pull “b>test</b”. A easy way to change this behaviour is like this: /(<(.+?)>)/ </li></ul>
    15. 15. References <ul><li> </li></ul><ul><li> </li></ul><ul><li>Thank you </li></ul>