Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Regular Expressions for theWeb Application Developer             By Andrew Kandels
Regular ExpressionsRegular expressions provide a concise, flexible means formatching strings of text, such as words or pat...
Why Use Them?•   Input Validation•   Input Filtering•   Search and Replace•   Parsing and Data Extraction•   Dynamic Recur...
In PHP, POSIX = Deprecatedereg_* functions are now deprecated in newer versions ofPHP.Switching to preg_* is generally pai...
Anatomy of a PHP Regular Expression                           /foo/i• Delimiters• Pattern to match• Options/modifierspreg_...
PHP Regular Expressions• Must use a delimiter: ! @ # /• Use PHP’s single quotes (no escaping ’s)preg_match                ...
Modifiers and Optionsi   PCRE_CASELESS – Ignores casem   PCRE_MULTILINE – Ignores new-liness   PCRE_DOTALL – New lines cou...
Performance KillersSlow-downs in performance generally come from:• Alternation, the pipe/OR operator (|)  Use [abcd] when ...
Sub-PatternsSub-Patterns allow you to extract relevant text from searches:• For preg_replace, use either 1 or $1 in your r...
Named Sub-Patterns(?P<name>pattern)
LookaheadsAre zero-match so they won’t modify your cursor or be included in any sub-patterns.                            (...
Lookbehinds   (?<!pattern)Accepts some basic regex
Multi-Line Processing                     /msU(Multi-line, include newlines with dots, non-greedy)
Once-Only Sub-PatternsEliminates slow recursion from wildcard searching.       Less scans = more speed.
GreedyBy default, PCRE returns the biggest match.        100,000 runs took 0.2791 seconds
Non-Greedy with ModifierThe /U modifier returns the SMALLEST match.       100,000 runs took 0.2638 seconds               (...
Restrictive Wild-CardingNo greedy flag needed, faster without broad wild-cards.         100,000 runs took 0.2271 seconds  ...
grepUse grep –E or egrep for extended regular expressions (+, ?, |)and advanced functionality.-A n         Print the next ...
sedUse –r (-E on OS X / FreeBSD) for extended regular expressions.
The End  Web: http://andrewkandels.com  Mail: mailto:akandels@gmail.comTwitter: @andrewkandels
Upcoming SlideShare
Loading in …5
×

Regular Expressions in PHP

3,620 views

Published on

From my November 3, 2011 talk at MNPHP. Regular expressions are a powerful tool available in nearly every programming language or platform, including PHP. I go over the history of POSIX vs. PCRE, examples in PHP, and optimizations on how to write faster expressions.

Published in: Technology
  • Be the first to comment

Regular Expressions in PHP

  1. 1. Regular Expressions for theWeb Application Developer By Andrew Kandels
  2. 2. Regular ExpressionsRegular expressions provide a concise, flexible means formatching strings of text, such as words or patterns ofcharacters.POSIX PCREPortable Operating System Interface Perl Compatible Regular Expressions• Traditional Unix regular • Perl 5 Extended Features expression syntax • Native C Extension • Generally Faster• PHP’s ereg_ functions • Optimization Qualifiers• Basic and extended versions Used by: • Programming languages • Apache and other servers
  3. 3. Why Use Them?• Input Validation• Input Filtering• Search and Replace• Parsing and Data Extraction• Dynamic Recursion• Automation
  4. 4. In PHP, POSIX = Deprecatedereg_* functions are now deprecated in newer versions ofPHP.Switching to preg_* is generally pain free. Pain points:• Different matching criteria (greed)• preg_* requires delimiters• Different characters require escape sequences• preg favors option modifiers over functions
  5. 5. Anatomy of a PHP Regular Expression /foo/i• Delimiters• Pattern to match• Options/modifierspreg_replace( „/(href|src)=„([^‟])*‟/i‟, „1=“2”‟, $str);
  6. 6. PHP Regular Expressions• Must use a delimiter: ! @ # /• Use PHP’s single quotes (no escaping ’s)preg_match Match against a pattern and extract textpreg_replace Like str_replace with a pattern (and sub-patterns)preg_match_all Like preg_match, but an array and count for every matchpreg_split Like explode() but with a patternpreg_quote Escapes text for use in a regular expression
  7. 7. Modifiers and Optionsi PCRE_CASELESS – Ignores casem PCRE_MULTILINE – Ignores new-liness PCRE_DOTALL – New lines count with dots (.)U Don’t be greedy
  8. 8. Performance KillersSlow-downs in performance generally come from:• Alternation, the pipe/OR operator (|) Use [abcd] when possible over (a|b|c|d)• Multi-line (PCRE_DOTALL or /s)• Recursion: (d+)d* Use lengths when possibleIt’s not that slow!
  9. 9. Sub-PatternsSub-Patterns allow you to extract relevant text from searches:• For preg_replace, use either 1 or $1 in your replacement string• Sub-patterns are left-most indexed by first left parenthesis “(“
  10. 10. Named Sub-Patterns(?P<name>pattern)
  11. 11. LookaheadsAre zero-match so they won’t modify your cursor or be included in any sub-patterns. (?=pattern) Pattern can be any valid regex
  12. 12. Lookbehinds (?<!pattern)Accepts some basic regex
  13. 13. Multi-Line Processing /msU(Multi-line, include newlines with dots, non-greedy)
  14. 14. Once-Only Sub-PatternsEliminates slow recursion from wildcard searching. Less scans = more speed.
  15. 15. GreedyBy default, PCRE returns the biggest match. 100,000 runs took 0.2791 seconds
  16. 16. Non-Greedy with ModifierThe /U modifier returns the SMALLEST match. 100,000 runs took 0.2638 seconds (a little better, and it’s right)
  17. 17. Restrictive Wild-CardingNo greedy flag needed, faster without broad wild-cards. 100,000 runs took 0.2271 seconds (fastest yet, no options needed)
  18. 18. grepUse grep –E or egrep for extended regular expressions (+, ?, |)and advanced functionality.-A n Print the next n lines after each match.-B n Print the previous n lines before each match.-i Ignore case-m n Stop after n matches-r Recursively search the file system-n Show line numbers-v Only show lines that don’t match
  19. 19. sedUse –r (-E on OS X / FreeBSD) for extended regular expressions.
  20. 20. The End Web: http://andrewkandels.com Mail: mailto:akandels@gmail.comTwitter: @andrewkandels

×