Regular Expressions in PHP
Upcoming SlideShare
Loading in...5

Regular Expressions in PHP



From my November 3, 2011 talk at MNPHP. Regular expressions are a powerful tool available in nearly every programming language or platform, including PHP. I go over the history of POSIX vs. PCRE, ...

From my November 3, 2011 talk at MNPHP. Regular expressions are a powerful tool available in nearly every programming language or platform, including PHP. I go over the history of POSIX vs. PCRE, examples in PHP, and optimizations on how to write faster expressions.



Total Views
Views on SlideShare
Embed Views



1 Embed 1 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Regular Expressions in PHP Regular Expressions in PHP Presentation Transcript

    • Regular Expressions for theWeb Application Developer By Andrew Kandels
    • Regular ExpressionsRegular expressions provide a concise, flexible means formatching strings of text, such as words or patterns ofcharacters.POSIX PCREPortable Operating System Interface Perl Compatible Regular Expressions• Traditional Unix regular • Perl 5 Extended Features expression syntax • Native C Extension • Generally Faster• PHP’s ereg_ functions • Optimization Qualifiers• Basic and extended versions Used by: • Programming languages • Apache and other servers
    • Why Use Them?• Input Validation• Input Filtering• Search and Replace• Parsing and Data Extraction• Dynamic Recursion• Automation
    • In PHP, POSIX = Deprecatedereg_* functions are now deprecated in newer versions ofPHP.Switching to preg_* is generally pain free. Pain points:• Different matching criteria (greed)• preg_* requires delimiters• Different characters require escape sequences• preg favors option modifiers over functions
    • Anatomy of a PHP Regular Expression /foo/i• Delimiters• Pattern to match• Options/modifierspreg_replace( „/(href|src)=„([^‟])*‟/i‟, „1=“2”‟, $str);
    • PHP Regular Expressions• Must use a delimiter: ! @ # /• Use PHP’s single quotes (no escaping ’s)preg_match Match against a pattern and extract textpreg_replace Like str_replace with a pattern (and sub-patterns)preg_match_all Like preg_match, but an array and count for every matchpreg_split Like explode() but with a patternpreg_quote Escapes text for use in a regular expression
    • Modifiers and Optionsi PCRE_CASELESS – Ignores casem PCRE_MULTILINE – Ignores new-liness PCRE_DOTALL – New lines count with dots (.)U Don’t be greedy
    • Performance KillersSlow-downs in performance generally come from:• Alternation, the pipe/OR operator (|) Use [abcd] when possible over (a|b|c|d)• Multi-line (PCRE_DOTALL or /s)• Recursion: (d+)d* Use lengths when possibleIt’s not that slow!
    • Sub-PatternsSub-Patterns allow you to extract relevant text from searches:• For preg_replace, use either 1 or $1 in your replacement string• Sub-patterns are left-most indexed by first left parenthesis “(“
    • Named Sub-Patterns(?P<name>pattern)
    • LookaheadsAre zero-match so they won’t modify your cursor or be included in any sub-patterns. (?=pattern) Pattern can be any valid regex
    • Lookbehinds (?<!pattern)Accepts some basic regex
    • Multi-Line Processing /msU(Multi-line, include newlines with dots, non-greedy)
    • Once-Only Sub-PatternsEliminates slow recursion from wildcard searching. Less scans = more speed.
    • GreedyBy default, PCRE returns the biggest match. 100,000 runs took 0.2791 seconds
    • Non-Greedy with ModifierThe /U modifier returns the SMALLEST match. 100,000 runs took 0.2638 seconds (a little better, and it’s right)
    • Restrictive Wild-CardingNo greedy flag needed, faster without broad wild-cards. 100,000 runs took 0.2271 seconds (fastest yet, no options needed)
    • grepUse grep –E or egrep for extended regular expressions (+, ?, |)and advanced functionality.-A n Print the next n lines after each match.-B n Print the previous n lines before each match.-i Ignore case-m n Stop after n matches-r Recursively search the file system-n Show line numbers-v Only show lines that don’t match
    • sedUse –r (-E on OS X / FreeBSD) for extended regular expressions.
    • The End Web: Mail: mailto:akandels@gmail.comTwitter: @andrewkandels