Regular Expressions in PHP


Published on

From my November 3, 2011 talk at MNPHP. Regular expressions are a powerful tool available in nearly every programming language or platform, including PHP. I go over the history of POSIX vs. PCRE, examples in PHP, and optimizations on how to write faster expressions.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Regular Expressions in PHP

  1. 1. Regular Expressions for theWeb Application Developer By Andrew Kandels
  2. 2. Regular ExpressionsRegular expressions provide a concise, flexible means formatching strings of text, such as words or patterns ofcharacters.POSIX PCREPortable Operating System Interface Perl Compatible Regular Expressions• Traditional Unix regular • Perl 5 Extended Features expression syntax • Native C Extension • Generally Faster• PHP’s ereg_ functions • Optimization Qualifiers• Basic and extended versions Used by: • Programming languages • Apache and other servers
  3. 3. Why Use Them?• Input Validation• Input Filtering• Search and Replace• Parsing and Data Extraction• Dynamic Recursion• Automation
  4. 4. In PHP, POSIX = Deprecatedereg_* functions are now deprecated in newer versions ofPHP.Switching to preg_* is generally pain free. Pain points:• Different matching criteria (greed)• preg_* requires delimiters• Different characters require escape sequences• preg favors option modifiers over functions
  5. 5. Anatomy of a PHP Regular Expression /foo/i• Delimiters• Pattern to match• Options/modifierspreg_replace( „/(href|src)=„([^‟])*‟/i‟, „1=“2”‟, $str);
  6. 6. PHP Regular Expressions• Must use a delimiter: ! @ # /• Use PHP’s single quotes (no escaping ’s)preg_match Match against a pattern and extract textpreg_replace Like str_replace with a pattern (and sub-patterns)preg_match_all Like preg_match, but an array and count for every matchpreg_split Like explode() but with a patternpreg_quote Escapes text for use in a regular expression
  7. 7. Modifiers and Optionsi PCRE_CASELESS – Ignores casem PCRE_MULTILINE – Ignores new-liness PCRE_DOTALL – New lines count with dots (.)U Don’t be greedy
  8. 8. Performance KillersSlow-downs in performance generally come from:• Alternation, the pipe/OR operator (|) Use [abcd] when possible over (a|b|c|d)• Multi-line (PCRE_DOTALL or /s)• Recursion: (d+)d* Use lengths when possibleIt’s not that slow!
  9. 9. Sub-PatternsSub-Patterns allow you to extract relevant text from searches:• For preg_replace, use either 1 or $1 in your replacement string• Sub-patterns are left-most indexed by first left parenthesis “(“
  10. 10. Named Sub-Patterns(?P<name>pattern)
  11. 11. LookaheadsAre zero-match so they won’t modify your cursor or be included in any sub-patterns. (?=pattern) Pattern can be any valid regex
  12. 12. Lookbehinds (?<!pattern)Accepts some basic regex
  13. 13. Multi-Line Processing /msU(Multi-line, include newlines with dots, non-greedy)
  14. 14. Once-Only Sub-PatternsEliminates slow recursion from wildcard searching. Less scans = more speed.
  15. 15. GreedyBy default, PCRE returns the biggest match. 100,000 runs took 0.2791 seconds
  16. 16. Non-Greedy with ModifierThe /U modifier returns the SMALLEST match. 100,000 runs took 0.2638 seconds (a little better, and it’s right)
  17. 17. Restrictive Wild-CardingNo greedy flag needed, faster without broad wild-cards. 100,000 runs took 0.2271 seconds (fastest yet, no options needed)
  18. 18. grepUse grep –E or egrep for extended regular expressions (+, ?, |)and advanced functionality.-A n Print the next n lines after each match.-B n Print the previous n lines before each match.-i Ignore case-m n Stop after n matches-r Recursively search the file system-n Show line numbers-v Only show lines that don’t match
  19. 19. sedUse –r (-E on OS X / FreeBSD) for extended regular expressions.
  20. 20. The End Web: Mail: mailto:akandels@gmail.comTwitter: @andrewkandels
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.