Regular Expressions
        101
       Raj Kissu
What are Regular
  Expressions?
They describe
patterns in strings
These patterns can
 be used to modify
      strings
Invented by Stephen
    Cole Kleene
Idea of RegEx dates
 back to the 1950s
Today, they come in
different “flavors”
PCRE, POSIX Basic &
Extended RegEx, ECMA
  RegEx and loads
       more!
NOTE: RegEx
“flavors” are not
   consistent in
  implementation
Different “flavors”
       behave
  differently ...
So know your
“flavor” before you
       use it!
Why Use It?
Cos it’s an
important tool in
  every coder’s
     arsenal!
Ok, but still ...
  Why use it???
It’s Short, Sweet &
   F***ing Fast!
It can match just
 about anything!
It makes changing
 large amounts of
  repetitive text
    trivial ...
... as long as you
can “see” patterns
It Makes You Awesome
- in a Geeky way :)
Before RegEx Mastery
After RegEx Mastery
Readily Available

Support in programming languages:
JavaScript, PHP, PERL, C/C++,etc

Command-line: grep, awk, sed

Text-editors: VIM, emacs, Notepad++

IDEs: Aptana, Netbeans, Visual
Studio .NET
RegEx Basics
NOTE : Using ECMA (JavaScript) RegEx Flavor!
Characters

standard characters

  letters : A to Z, a to z

  numbers : 0 to 9

  symbols : !,@,#,%,& etc

Matched literally!
Meta Characters


Special characters : )(][^$.|?*+

To match as literals, escape them
with a backslash
Character Classes
Matches one and ONLY one character in
a set of characters

[Aa]   : matches either ‘A’ or ‘a’

[a-z] : matches any of the lowercase
alphabets in the specified range ONCE

[^Aa] : matches anything but ‘A’ and
‘a’
Character Classes

Metacharacters may behave differently
within character classes

[^red] : matches anything but ‘r’,
‘e’ and ‘d’

[r^ed] : matches only ‘r’, ‘^’, ‘e’
or ‘d’
Shorthand Classes

d, [0-9]: digits

w, [da-zA-Z_]: alphanumeric or _

s or [ t(?:n|rn)] : whitespace

D, W, S : the above BUT negated
The Dot Character

The Dot (.) character matches any
single character BUT the newline

Synonymous with [^n] (UNIX/Linux/
Mac)

as well as [^rn] (Windows)

Use it sparingly - it’s expensive
Alternation

Using a pipe |, match either the left
or right side of the pattern

bear|tiger : matches a string that
contains either “bear” or “tiger”

pedo(bea|tige)r : matches a string
that contains either “pedobear” or
“pedotiger”
Quantifiers

{n} : matches exactly n times

{n,} : matches n or more times

{n,m} : matches between n and m times

* : same as {0,}

+ : same as {1,}

? : same as {0,1}
Quantifiers

Quantifiers are greedy

<.+> : matches “<div>holy RegEx,
Batman!</div>” instead of stopping at
“<div>”

Add a ? to make it lazy

<.+?> : stops at “<div>” in
“<div>holy regex!</div>”
Anchors

Matches positions instead of
characters

^ : matches the beginning of a string

$ : matches the end of a string

b : matches between a w and a token
that’s not a w
Groupings
Placing parentheses around tokens
groups them together : /nyan(cat)/

It also provides a
backreference :    /(cat)1/ matches
“cat”

OR if you don’t want a
backreference :   /(?:nyan)(cat)1/
matches “nyancatcat” and not
“nyancatnyan”
Lookahead

Positive Lookahead

  Iron(?=man) : matches “Iron” only
  if it is followed by “man”

Negative Lookahead

  Iron(?!man) : matches “Iron” only
  if it is not followed by “man”
Lookbehind

Positive Lookbehind

  (?<=Iron)man : matches “man” only
  if it is preceded by “Iron”

Negative Lookbehind

  (?<!Iron)man : matches “man” only
  if it is not preceded by “Iron”
Modifiers

alter behavior of the matching mode
(differs between tools)

/i : case-insensitive match

/m : Multi-line mode

/g : affects all possible matches,
not just the first
Q & A
Resources

Mastering Regular Expressions -
Jeffrey E.F. Friedl

http://www.regex.info

http://www.regular-expressions.info

http://www.rubular.com
Thank You!

Regular Expressions 101