3. A regular expression is a special sequence of
characters that helps you match or find other
strings or sets of strings, using a specialized
syntax held in a pattern. They can be used to
search, edit, or manipulate text and data.
9. Readily Available
Support in programming languages: JavaScript,
Java, PHP, PERL, C/C++,etc
Command-line: grep, awk, sed
Text-editors: VIM, emacs, Notepad++
IDEs: Eclipse, Netbeans, Visual Studio .NET
10. Literal Characters
letters : A to Z, a to z
numbers : 0 to 9
symbols : ! @ # % &
Matched literally!
Matched anywhere, even middle of words
Is case sensitive
13. Regex Target String
The Dot Character
The Dot (.) character matches any single character except the
newline
Synonymous with [^n] (UNIX/Linux/Mac)
as well as [^rn] (Windows)
Use it sparingly - it’s expensive!!
a.boy Jack is a boy
.a aac bac cac dac eac fac
Jack is a boy
aac bac cac dac eac fac
14. Regex Target String
Character Classes
Indicated by [ ] and matches one and ONLY one character in
a set of characters
[Aa] : matches either ‘A’ or ‘a’
[Gg]r[ae]y Grayson drives a grey sedan.
15. Regex Target String
Character Classes
q[^u] Qatar is home to quite a lot of Iraqui/Iraqi citizens, but is
not a city in Iraq
Caret (^) inside a character class negates the match.
[^Aa] : matches anything but ‘A’ and ‘a’
Qatar is home to quite a lot of Iraqui/Iraqi citizens, but is
not a city in Iraq
16. Shorthand classes
Shortcut Name Equivalent Class
d Digit [0-9]
D Not digit [^0-9]
w Word [a-zA-Z0-9]
W Not word [^a-zA-Z0-9]
s Space (separator) [ tnrfv]
S Not space [^ tnrfv]
. everything [^n] (depends on mode)
17. Repeaters
Symbols indicating that the preceding element of the
pattern can repeat
Repeater Count
? Zero or one
+ One or more
* Zero or more
18. Quantifiers
{n} : matches exactly n times
{n,} : matches n or more times
{n,m} : matches between n and m times
* : same as {0,}
+ : same as {1,}
? : same as {0,1}
21. Making quantifiers lazy
? to make it lazy
<.+?>
<div>holy RegEx, Batman!</div>”<div>holy RegEx, Batman!</div>”
22. Groupings 1
Everything within ( … ) is grouped into a single element for the
purpose of repetition and alternation
Regex Target String
(la)+? la lala lalala all ala
schema(ta)? schema or schemata or schematicschema or schemata or schematic
la lala lalala all ala
23. Back references
([ai]).1.1
The magician said abracadabra!
Groupings set the regex together for applying repetition.
The magician said abracadabra!
Grouping also creates a back reference to refer to them later.
(abc){3} matches abcabcabc. First group matches abc.
24. Capture
During searches, patterns in ( … ) groups can be
captured for replacement.
Special variables $1, $2, $3 etc. or 1, 2, 3 etc.
contain the capture.
(ddd)-(dddd) 123-4567 is my
number
$1 contains
$2 contains
123
4567
25. Replacement
Regex most often used for search/replace
Syntax varies; most scripting languages and CLI tools use
s/pattern/replacement/
26. Lookahead
• Positive Lookahead
– Iron(?=man) : matches “Iron” only if it is followed by
“man”
• Negative Lookahead
– Iron(?!man) : matches “Iron” only if it is not followed
by “man”
27. Lookbehind
• Positive Lookbehind
– (?<=Iron)man : matches “man” only if it is preceded by
“Iron”
• Negative Lookbehind
– (?<!Iron)man : matches “man” only if it is not
preceded by “Iron”
28. Modifiers
alter behavior of the matching mode (differs
between tools)
/i : case-insensitive match
/m : Multi-line mode
/g : affects all possible matches, not just the first