Regular Expressions and You

An introduction to regular expressions.




James I. Armes
Web Developer, AllPlayers.com
@jamesiarmes
Email Validation Examples




 ^[w.%+-]+@[w.-]+.[A-Za-z]{2,4}$
Email Validation Examples
(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:
[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 
000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[
["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:
(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[
t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+
(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[
["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:
(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:
(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:
[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 000-031]+(?:(?:
(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?
=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:
[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?
[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+
(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])
+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^
[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:
(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 
000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:
(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[
["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:
(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)
(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:
[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:
(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:
(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.
(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 
000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[
["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:
(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:
(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:
[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*))*)?;s*)
Types of Regular Expressions

●   Simple Regular Expressions
●   POSIX Basic Regular Expressions
●   POSIX Extended Regular Expressions
●   Perl Regular Expressions
Simple Regular Expressions
●   Traditional regular expressions.
●   Not a standard.
●   Support by some applications for backwards
    compatibility.
●   Deprecated.
POSIX Basic Regular
             Expressions
●   Created to provide a common standard for Unix
    tools.
●   Designed to be backwards compatible with
    traditional regular expressions.
●   Adopted as the default syntax of many Unix
    tools.
●   Some metacharacters require escaping.
POSIX Extended Regular
             Expressions
●   Adds some new metacharacters.
●   Metacharacters do not require escaping.
●   Dropped support for back references (n).
●   Many Unix tools provide support with a
    command line argument (usually -E).
Perl Regular Expressions

●   Adds lazy quantification, named capture groups
    and recursive patterns.
●   Adopted by many programming languages due
    to its power.
●   Requires non-alphanumeric delimiters around
    expression.
●   Other languages only implement a subset, so
    implementations vary.
Syntax
Basic Metacharacters

.   Match any single character.

^   Matches beginning of a string.

$   Matches end of a string.

|   Matches the expression before or after (think ||).
Character Classes

[]      Match any characters within the group.
[^ ]    Match any characters NOT within the group.
[n-m]   Match a range of characters.




Examples:
[A-Za-z0-9]
[^G-Zg-z _]
Shorthand Character Classes

s           Any whitespace character such as space, tab and newlines.
             Same as [nrt ]
w           Any word character.
             Same as [A-Za-z0-9_]
d           Any digit character.
             Same as [0-9]
S, W, D   Negated version of the above. Can be used inside character
             classes but could be confusing.
Quantifiers

*       Match the preceding expression 0 or more times.
+       Match the preceding expression 1 or more times.
?       Match the preceding expression 0 or 1 time.
{m,n}   Match the preceding expression at least m times but no more than n times.
{m,}    Match the preceding expression at least m times with no maximum.
{,n}    Match the preceding expression no more than n times with no minimum.
{n}     Match the preceding expression exactly n times.
Lazy Quantifiers

Standard Quantifiers are greedy.
Example:
Many programming courses start with a "Hello World" example.
That would be "Hallo Welt" in German.
"Hello .*"
Many programming courses start with a "Hello World" example.
That would be "Hallo Welt" in German.
Lazy Quantifiers

Use ? to make a quantifier lazy.
Example:
Many programming courses start with a "Hello World" example.
That would be "Hallo Welt" in German.
"Hello .*?"
Many programming courses start with a "Hello World" example.
That would be "Hallo Welt" in German.
Grouping

()      Group the expression and capture the text.
(?: )   Group the expression but DO NOT capture the text.
Backreferences

1 through 9 reference previously captured text.
Example:
Many programming courses start with a "Hello World"
example. 'Hello World' examples are extremely simple,
especially when they just output "Hello World'.
('|")Hello World(1)
Many programming courses start with a "Hello World"
example. 'Hello World' examples are extremely simple,
especially when they just output "Hello World'.
Word Boundaries

b matches the position between a word character
(w) and a non-word character (W).
Example:
Hello World
ob
Hello| World
Word Boundaries

B matches the position between two word
characters (ww).
Example:
Hello World
oB
Hello Wo|rld
Lookaheads

(?= ) matches the position directly before the
expression is matched.
Example:
Hello World sounds better than "Hello Earth".
Hello(?= World)
Hello World sounds better than "Hello Earth".
Lookbehinds

(?<= ) matches the position directly after the
expression is matched.
Example:
Hello World sounds better than "Hello Earth".
(?<=")Hello
Hello World sounds better than "Hello Earth".
Lookaheads

(?! ) matches the position directly before the
expression is NOT matched.
Example:
Hello World sounds better than "Hello Earth".
Hello(?! World)
Hello World sounds better than "Hello Earth".
Lookbehinds

(?<! ) matches the position directly after the
expression is NOT matched.
Example:
Hello World sounds better than "Hello Earth".
(?<!")Hello
Hello World sounds better than "Hello Earth".
Conditionals

(?(condition)then|else)
●   condition must be a lookahead or a lookbehind.
●   If condition is matched, then must match for the
    expression to pass.
●   If condition is not matched, else must match for
    the expression to pass.
Conditionals

Example:
Hello World sounds better than "Hello Earth".
Hello (?(?<=World)World|Earth)
Hello World sounds better than "Hello Earth".
Hello (?(?<=People)People|Earth)
Hello World sounds better than "Hello Earth".
Modifiers

i   Case insensitive matching.
s   . matches newline characters.
m   ^ and $ match after and before newlines (respectively).
x   Whitespace within the expression is ignored unless escaped.
g   Match globally.
Modifiers

●   (?a) to turn modifiers on.
●(?-a) to turn modifiers off.
Examples:
(?i)WORLD(?-i)
(?i-s)WORLD.(?s-i)
(?i:WORLD)
Language
Implementations
JavaScript

●   RegExp object.
        –   var expression = new RegExp('World', 'g');
        –   var expression = /World/g;
●   String.match()
●   String.replace()
●   String.split()
Perl

●   if ($string =~ /regex/)
●   $string =~ s/regex/replacement/
●   Regexp::Common
        –   http://search.cpan.org/dist/Regexp-Common/
        –   Provides common expressions.
        –   Examples:
                ●   IP Address
                ●   Credit Card Number
                ●   Profanity
PHP

●   ereg vs. preg
       –   preg uses Perl syntax.
       –   ereg uses POSIX Extended syntax.
       –   preg is much faster.
       –   ereg has been deprecated as of PHP 5.3.
PHP

●   preg_match()
●   preg_match_all()
●   preg_replace()
●   preg_split()
●   preg_quote()
●   http://www.php.net/manual/en/book.pcre.php
●   http://php.net/manual/reference.pcre.pattern.modifiers.php
Tools and Resources

●   txt2regex - http://aurelio.net/txt2regex/
●   Reggy (mac) - http://reggyapp.com/
●   Patterns (mac) - http://krillapps.com/patterns/
●   Web based - http://regex.larsolavtorvik.com/
●   Regular-Expressions.info (reference) -
    http://www.regular-expressions.info/
Thanks!




http://xkcd.com/208/

Regular Expressions and You

  • 1.
    Regular Expressions andYou An introduction to regular expressions. James I. Armes Web Developer, AllPlayers.com @jamesiarmes
  • 2.
    Email Validation Examples ^[w.%+-]+@[w.-]+.[A-Za-z]{2,4}$
  • 3.
    Email Validation Examples (?:(?:rn)?[t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?: [^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[ ["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?: (?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+ (?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[ ["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?: (?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?: (?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?: [^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 000-031]+(?:(?: (?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(? =[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?: [^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)? [ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+ (?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t]) +|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^ []r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?: (?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?: (?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[ ["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?: (?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*) (?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?: [^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?: (?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?: (?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:. (?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[ ["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?: (?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?: (?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?: [^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*))*)?;s*)
  • 4.
    Types of RegularExpressions ● Simple Regular Expressions ● POSIX Basic Regular Expressions ● POSIX Extended Regular Expressions ● Perl Regular Expressions
  • 5.
    Simple Regular Expressions ● Traditional regular expressions. ● Not a standard. ● Support by some applications for backwards compatibility. ● Deprecated.
  • 6.
    POSIX Basic Regular Expressions ● Created to provide a common standard for Unix tools. ● Designed to be backwards compatible with traditional regular expressions. ● Adopted as the default syntax of many Unix tools. ● Some metacharacters require escaping.
  • 7.
    POSIX Extended Regular Expressions ● Adds some new metacharacters. ● Metacharacters do not require escaping. ● Dropped support for back references (n). ● Many Unix tools provide support with a command line argument (usually -E).
  • 8.
    Perl Regular Expressions ● Adds lazy quantification, named capture groups and recursive patterns. ● Adopted by many programming languages due to its power. ● Requires non-alphanumeric delimiters around expression. ● Other languages only implement a subset, so implementations vary.
  • 9.
  • 10.
    Basic Metacharacters . Match any single character. ^ Matches beginning of a string. $ Matches end of a string. | Matches the expression before or after (think ||).
  • 11.
    Character Classes [] Match any characters within the group. [^ ] Match any characters NOT within the group. [n-m] Match a range of characters. Examples: [A-Za-z0-9] [^G-Zg-z _]
  • 12.
    Shorthand Character Classes s Any whitespace character such as space, tab and newlines. Same as [nrt ] w Any word character. Same as [A-Za-z0-9_] d Any digit character. Same as [0-9] S, W, D Negated version of the above. Can be used inside character classes but could be confusing.
  • 13.
    Quantifiers * Match the preceding expression 0 or more times. + Match the preceding expression 1 or more times. ? Match the preceding expression 0 or 1 time. {m,n} Match the preceding expression at least m times but no more than n times. {m,} Match the preceding expression at least m times with no maximum. {,n} Match the preceding expression no more than n times with no minimum. {n} Match the preceding expression exactly n times.
  • 14.
    Lazy Quantifiers Standard Quantifiersare greedy. Example: Many programming courses start with a "Hello World" example. That would be "Hallo Welt" in German. "Hello .*" Many programming courses start with a "Hello World" example. That would be "Hallo Welt" in German.
  • 15.
    Lazy Quantifiers Use ?to make a quantifier lazy. Example: Many programming courses start with a "Hello World" example. That would be "Hallo Welt" in German. "Hello .*?" Many programming courses start with a "Hello World" example. That would be "Hallo Welt" in German.
  • 16.
    Grouping () Group the expression and capture the text. (?: ) Group the expression but DO NOT capture the text.
  • 17.
    Backreferences 1 through 9reference previously captured text. Example: Many programming courses start with a "Hello World" example. 'Hello World' examples are extremely simple, especially when they just output "Hello World'. ('|")Hello World(1) Many programming courses start with a "Hello World" example. 'Hello World' examples are extremely simple, especially when they just output "Hello World'.
  • 18.
    Word Boundaries b matchesthe position between a word character (w) and a non-word character (W). Example: Hello World ob Hello| World
  • 19.
    Word Boundaries B matchesthe position between two word characters (ww). Example: Hello World oB Hello Wo|rld
  • 20.
    Lookaheads (?= ) matchesthe position directly before the expression is matched. Example: Hello World sounds better than "Hello Earth". Hello(?= World) Hello World sounds better than "Hello Earth".
  • 21.
    Lookbehinds (?<= ) matchesthe position directly after the expression is matched. Example: Hello World sounds better than "Hello Earth". (?<=")Hello Hello World sounds better than "Hello Earth".
  • 22.
    Lookaheads (?! ) matchesthe position directly before the expression is NOT matched. Example: Hello World sounds better than "Hello Earth". Hello(?! World) Hello World sounds better than "Hello Earth".
  • 23.
    Lookbehinds (?<! ) matchesthe position directly after the expression is NOT matched. Example: Hello World sounds better than "Hello Earth". (?<!")Hello Hello World sounds better than "Hello Earth".
  • 24.
    Conditionals (?(condition)then|else) ● condition must be a lookahead or a lookbehind. ● If condition is matched, then must match for the expression to pass. ● If condition is not matched, else must match for the expression to pass.
  • 25.
    Conditionals Example: Hello World soundsbetter than "Hello Earth". Hello (?(?<=World)World|Earth) Hello World sounds better than "Hello Earth". Hello (?(?<=People)People|Earth) Hello World sounds better than "Hello Earth".
  • 26.
    Modifiers i Case insensitive matching. s . matches newline characters. m ^ and $ match after and before newlines (respectively). x Whitespace within the expression is ignored unless escaped. g Match globally.
  • 27.
    Modifiers ● (?a) to turn modifiers on. ●(?-a) to turn modifiers off. Examples: (?i)WORLD(?-i) (?i-s)WORLD.(?s-i) (?i:WORLD)
  • 28.
  • 29.
    JavaScript ● RegExp object. – var expression = new RegExp('World', 'g'); – var expression = /World/g; ● String.match() ● String.replace() ● String.split()
  • 30.
    Perl ● if ($string =~ /regex/) ● $string =~ s/regex/replacement/ ● Regexp::Common – http://search.cpan.org/dist/Regexp-Common/ – Provides common expressions. – Examples: ● IP Address ● Credit Card Number ● Profanity
  • 31.
    PHP ● ereg vs. preg – preg uses Perl syntax. – ereg uses POSIX Extended syntax. – preg is much faster. – ereg has been deprecated as of PHP 5.3.
  • 32.
    PHP ● preg_match() ● preg_match_all() ● preg_replace() ● preg_split() ● preg_quote() ● http://www.php.net/manual/en/book.pcre.php ● http://php.net/manual/reference.pcre.pattern.modifiers.php
  • 33.
    Tools and Resources ● txt2regex - http://aurelio.net/txt2regex/ ● Reggy (mac) - http://reggyapp.com/ ● Patterns (mac) - http://krillapps.com/patterns/ ● Web based - http://regex.larsolavtorvik.com/ ● Regular-Expressions.info (reference) - http://www.regular-expressions.info/
  • 34.