Introductory talk on regular expressions for developers who'd like to get their hands dirty. The full tutorial is documented here:
http://tech.bluesmoon.info/2006/04/beginning-regular-expressions.html
Designing IA for AI - Information Architecture Conference 2024
Regular Expressions Demystified
1. Introduction
Diving In
Building up
/regular expressions/demystified
From deckhand to pirate in 30 minutes
Philip Tellis / philip@bluesmoon.info
Yahoo!
/regular expressions/demystified
2. Introduction
Diving In
Building up
Outline
1 Introduction
Who’s playing?
Conventions
2 Diving In
Starting Small
Getting meta
3 Building up
More or less
Alternation
Groups
/regular expressions/demystified
3. Introduction
Who’s playing?
Diving In
Conventions
Building up
$ whoami?
Philip Tellis
philip@bluesmoon.info
@bluesmoon
yahoo
geek
/regular expressions/demystified
4. Introduction
Who’s playing?
Diving In
Conventions
Building up
Who are you?
Developer
Curious
Interested in regular expressions
You may or may not have used them before
/regular expressions/demystified
5. Introduction
Who’s playing?
Diving In
Conventions
Building up
What is a regular expression?
A pattern that can match multiple strings
A pattern matching language
A Finite Automaton
/regular expressions/demystified
6. Introduction
Who’s playing?
Diving In
Conventions
Building up
What is a regular expression?
But this is a hacker session, so let’s forget the theory.
(You can read the book later.)
/regular expressions/demystified
7. Introduction
Who’s playing?
Diving In
Conventions
Building up
What is a regular expression?
But this is a hacker session, so let’s forget the theory.
(You can read the book later.)
/regular expressions/demystified
8. Introduction
Who’s playing?
Diving In
Conventions
Building up
Conventions used in this talk
Text in ’single quotes’ denotes a literal string
Text in /forward slashes/ denotes a regular
expression
The operator =∼ indicates that the string on the left
matches the pattern on the right
The operator !∼ indicates that the string on the left does
not match the pattern on the right
$string denotes a variable containing a string
/regular expressions/demystified
9. Introduction
Starting Small
Diving In
Getting meta
Building up
Match a single character
’a’ =~ /a/
/regular expressions/demystified
10. Introduction
Starting Small
Diving In
Getting meta
Building up
Let’s try a different character
’t’ =~ /t/
/regular expressions/demystified
11. Introduction
Starting Small
Diving In
Getting meta
Building up
Building up
Combine the previous two into a single regular expression
’at’ =~ /at/
/regular expressions/demystified
12. Introduction
Starting Small
Diving In
Getting meta
Building up
You now know regular expressions
To build a regular expression, break the pattern into small
manageable pieces and incrementally combine them.
/regular expressions/demystified
13. Introduction
Starting Small
Diving In
Getting meta
Building up
Metacharacters
The regex language has its own syntax characters to do
funky things
Some of these act as wild cards
Others act as modifiers to whatever comes before them
And some of them make your brain explode
We won’t be blowing up brains today
/regular expressions/demystified
14. Introduction
Starting Small
Diving In
Getting meta
Building up
Metacharacters
The regex language has its own syntax characters to do
funky things
Some of these act as wild cards
Others act as modifiers to whatever comes before them
And some of them make your brain explode
We won’t be blowing up brains today
/regular expressions/demystified
15. Introduction
Starting Small
Diving In
Getting meta
Building up
Metacharacters
The regex language has its own syntax characters to do
funky things
Some of these act as wild cards
Others act as modifiers to whatever comes before them
And some of them make your brain explode
We won’t be blowing up brains today
/regular expressions/demystified
16. Introduction
Starting Small
Diving In
Getting meta
Building up
The . metacharacter
Matches ONE and ONLY ONE character
’a’ =~ /./
’b’ =~ /./
’c’ =~ /./
’’ !~ /./
The empty string has less than ONE character
’abc’ has ONE character. . . three times
’abc’ =~ /./
/regular expressions/demystified
17. Introduction
Starting Small
Diving In
Getting meta
Building up
The . metacharacter
Matches ONE and ONLY ONE character
’a’ =~ /./
’b’ =~ /./
’c’ =~ /./
’’ !~ /./
The empty string has less than ONE character
’abc’ has ONE character. . . three times
’abc’ =~ /./
/regular expressions/demystified
18. Introduction
Starting Small
Diving In
Getting meta
Building up
The . metacharacter
Matches ONE and ONLY ONE character
’a’ =~ /./
’b’ =~ /./
’c’ =~ /./
’’ !~ /./
The empty string has less than ONE character
’abc’ has ONE character. . . three times
’abc’ =~ /./
/regular expressions/demystified
19. Introduction
Starting Small
Diving In
Getting meta
Building up
The fate of gate hate date
/.ate/
Matches Does not match
aate bate cate date . . . ate
crates abates dates elated ates ated
...
@ate 9ate ’ ate’
/regular expressions/demystified
20. Introduction
Starting Small
Diving In
Getting meta
Building up
The fate of gate hate date
/.ate/
Matches Does not match
aate bate cate date . . . ate
crates abates dates elated ates ated
...
@ate 9ate ’ ate’
/regular expressions/demystified
21. Introduction
Starting Small
Diving In
Getting meta
Building up
Character classes
/[a-z]ate/
Matches Does not match
aate bate cate date . . . ate
crates abates dates elated ates ated
... @ate 9ate ’ ate’
/regular expressions/demystified
22. Introduction
Starting Small
Diving In
Getting meta
Building up
Character classes
To match a literal ’-’ it should be the first or last character in
the class:
/[+-*/]/ # Incorrect
/[+*/-]/ # Correct
/regular expressions/demystified
23. Introduction
Starting Small
Diving In
Getting meta
Building up
Negated character classes
/[^a-z]ate/
Matches Does not match
@ate 9ate ’ ate’ ate ates ated
g@ate e9ated aate bate cate date . . .
crates abates dates elated
...
/regular expressions/demystified
24. Introduction
Starting Small
Diving In
Getting meta
Building up
The late fate of gate hate date rate
/[df-hlr]ate/
Matches Does not match
date fate gate hate late ate aate bate cate eate
rate iate jate kate . . .
dates fated billgates hated
...
/regular expressions/demystified
25. Introduction
Starting Small
Diving In
Getting meta
Building up
The late fate of gate hate date rate
/[df-hlr]ate/
Matches Does not match
date fate gate hate late ate aate bate cate eate
rate iate jate kate . . .
dates fated billgates hated
...
/regular expressions/demystified
26. Introduction
Starting Small
Diving In
Getting meta
Building up
Anchors
/^[df-hlr]ate$/
Matches Does not match
date fate gate hate late ate aate bate . . .
rate dates gated berate elated
...
/regular expressions/demystified
27. Introduction
Starting Small
Diving In
Getting meta
Building up
Anchors
ˆ matches the start of the string
$ matches the end of the string
Both are 0 byte matches, ie, they do not match any
character
/regular expressions/demystified
28. Introduction More or less
Diving In Alternation
Building up Groups
Matching more than one of something
? – matches 0 or 1 of what comes before it
* – matches 0 or more of what comes before it
+ – matches 1 or more of what comes before it
{n,m} – matches between n and m of what comes before it
/regular expressions/demystified
29. Introduction More or less
Diving In Alternation
Building up Groups
Aaargh!
Everyone shout “Aaarrrgh!”
/regular expressions/demystified
30. Introduction More or less
Diving In Alternation
Building up Groups
How many ways can you say Aargh!?
argh
aaaaaargh
aaaarrrrghhh
aaaaarrrrrggggghhhh
aaarrrrggggg
aaaaarrrrrhhhh
/regular expressions/demystified
31. Introduction More or less
Diving In Alternation
Building up Groups
Match ’em all
/a+r+g+h+/ # aarrrrgggghhhh
/a+r+g+h*/ # aarrgghh & aarrgg
/a+r+g*h+/ # aarrgghh & aarrhh
/a+r+g*h*/ # argh & arg & arh
That last one also matches ’ar’ which we don’t want
/regular expressions/demystified
32. Introduction More or less
Diving In Alternation
Building up Groups
Match ’em all
/a+r+g+h+/ # aarrrrgggghhhh
/a+r+g+h*/ # aarrgghh & aarrgg
/a+r+g*h+/ # aarrgghh & aarrhh
/a+r+g*h*/ # argh & arg & arh
That last one also matches ’ar’ which we don’t want
/regular expressions/demystified
33. Introduction More or less
Diving In Alternation
Building up Groups
Match ’em all
/a+r+g+h+/ # aarrrrgggghhhh
/a+r+g+h*/ # aarrgghh & aarrgg
/a+r+g*h+/ # aarrgghh & aarrhh
/a+r+g*h*/ # argh & arg & arh
That last one also matches ’ar’ which we don’t want
/regular expressions/demystified
34. Introduction More or less
Diving In Alternation
Building up Groups
Match ’em all
/a+r+g+h+/ # aarrrrgggghhhh
/a+r+g+h*/ # aarrgghh & aarrgg
/a+r+g*h+/ # aarrgghh & aarrhh
/a+r+g*h*/ # argh & arg & arh
That last one also matches ’ar’ which we don’t want
/regular expressions/demystified
35. Introduction More or less
Diving In Alternation
Building up Groups
Match ’em all
/a+r+g+h+/ # aarrrrgggghhhh
/a+r+g+h*/ # aarrgghh & aarrgg
/a+r+g*h+/ # aarrgghh & aarrhh
/a+r+g*h*/ # argh & arg & arh
That last one also matches ’ar’ which we don’t want
/regular expressions/demystified
36. Introduction More or less
Diving In Alternation
Building up Groups
Alternation: Match all this or all that
/ab|cd/
Matches either ’ab’ or ’cd’
/regular expressions/demystified
37. Introduction More or less
Diving In Alternation
Building up Groups
From here to eternity
| matches either everything on its left or everything on its right
(That’s a pipe character, not the letter I)
/regular expressions/demystified
38. Introduction More or less
Diving In Alternation
Building up Groups
Back to aaargh
/g*h+|g+h*/
This matches all the endings we want:
ggggghhhhhh
ggggg
hhhhh
/regular expressions/demystified
39. Introduction More or less
Diving In Alternation
Building up Groups
Back to aaargh
/a+r+g*h+|g+h*/
This doesn’t quite work
Matches Does not match
aaarrrhhh aaarrrggg
aaarrrrggghhh
gggg
gggghhhh
/regular expressions/demystified
40. Introduction More or less
Diving In Alternation
Building up Groups
Back to aaargh
/a+r+g*h+|g+h*/
This doesn’t quite work
Matches Does not match
aaarrrhhh aaarrrggg
aaarrrrggghhh
gggg
gggghhhh
/regular expressions/demystified
41. Introduction More or less
Diving In Alternation
Building up Groups
Group the subexpression
/a+r+(g*h+|g+h*)/
Matches
aaarrrhhh
aaarrrggg
aaarrrrggghhh
/regular expressions/demystified
42. Introduction More or less
Diving In Alternation
Building up Groups
Grouping parentheses
( and ) mark a group
| alternates within a group
Groups may be nested - it’s like a new regex inside
+, *, ? and {n,m} may apply to an entire group
/regular expressions/demystified
43. Introduction
Diving In
Building up
Stop
/regular expressions/demystified
44. Introduction
Diving In
Building up
Summary
Start small, match the parts you understand
Build up to more complex patterns
Not all problems should be solved by regular expressions
/regular expressions/demystified
45. Introduction
Diving In
Building up
More Info. . .
“Mastering Regular Expressions” – Jeffrey Friedl
http://tech.bluesmoon.info/search/label/regex
/regular expressions/demystified
46. Introduction
Diving In
Building up
Contact me
Philip Tellis
philip@bluesmoon.info
@bluesmoon
bluesmoon.info
/regular expressions/demystified
47. Introduction
Diving In
Building up
Image credits
http://flickr.com/photos/practicalowl/3933514241/
http://flickr.com/photos/loozrboy/3908830690/
http://flickr.com/photos/thetruthabout/2680546103/
http://flickr.com/photos/donsolo/2136923757/
/regular expressions/demystified
48. Introduction
Diving In
Building up
Thank You
/regular expressions/demystified
49. Introduction
Diving In
Building up
Aargh with class
/a+r+g*[gh]h*/
Matches
aaarrrhhh
aaarrrggg
aaarrrrggghhh
/regular expressions/demystified
50. Introduction
Diving In
Building up
Matching meta characters in a character class
/[a-zA-Z0-9_-]/
/[a-z^]/
/[][]/
/regular expressions/demystified
51. Introduction
Diving In
Building up
Alternating multiple items
/apples|oranges|bananas/
/buy some (apples|oranges|ba(na){2}s)/
/regular expressions/demystified