Regular Expressions

Regular Expressions

21 Januari 2014
Sido Grond
Exception Twente
1

Index

•
•
•
•
•
•
•

Introduction
Applications
Constructs
Regular expressions in C#
Demos
Pros and cons
Conclusions

2

Introduction
• Regular Expression (RE or regex) is a text
string that describes a search pattern
• Some similarity to wildcards (e.g. *.txt)
• Platform/language independent but some
minor differences among them
• Available in a.o. C#, Perl, Python, Java,
Javascript, Visual Studio, Notepad++, Linux
command line tools
3

Applications
• Syntax highlighting
• Find and replace
– Visual Studio (as usual: different syntax)
– Notepad++

• Text searching
– Unix tools: grep, sed, find

• Programming
– Pattern matching, filtering, replacing
4

Constructs: single character
Regex

Strings that match

Strings that don’t match

a

“abc”, “bad”

“xyz”

â

“abc”

“bad”, “xyz”, “â”

a$

“cba”

“abc”, “bad”, “xyz”

^[a-z]$

“a”, “x”

“aa”, “A”

^[a-z0-9+-]$

“s”, “3”, “+”, “-”

“12”, “Q”, “a1”

^[â-zA-Z]$

“5”, “+”

“p”, “R”, “abc”, “15”

^[a-zA-Z]

“b”, “C123”, “dd”

“1cba”, “ f”(note space)

[â-zA-Z]

“1cba”, “ f”

“b”, “dd”

^.$

“m”, “3”, “%”

“13”, “%a”, “xyz”

.

“%”, “e4”, “xyz”, “.”

“”

.

“.”

“%”, “e4”, “xyz”
5

Constructs: word groups
•
•
•
•
•
•

d is shorthand for Digits [0-9]
w is shorthand for Words [a-zA-Z0-9]
s is shorthand for whiteSpace [ trn]
D is shorthand for non-Digits [^0-9]
W is shorthand for non-Words [^a-zA-Z0-9]
S is shorthand for non-whiteSpace [^ trn]

6

Constructs: multiple character
Regex

Strings that match


abc

“abcde”, “rabco”

“bac”, “ABC”, “bcde”

^abc

“abc”, “abcde”

“rabco”, “ABC”, “bcde”

Ed

“ME3”, “E85”, “EBE5” “ACME”, “E”, “E 8”

Ed+$

“ME35”, “EBE5”

“E85x”, “E3EB”, “E 8”

foo|bar

“foot”, “bart”

“ooba”

a(b|c)d

“Tabd”, “acdc”

“abcd”, “aed”, “ad”

^foo|bar

“foo1”, “Harbar”

“toofoo”

^(foo|bar)

“foo1”, “bart”

“toofoo”, “Harbar”

7

Constructs: quantifiers
Regex

Strings that match


â?$

“”, “a”

“aa”, “aaaaa”

â*$

“a”, “aaaaa”, “”

“aaab”, “c”, “ba”

â+$

“a”, “aaaaa”

“aaab”, “c”, “ba”, “”

â{4}$

“aaaa”

“a”, “aaaaa”, “”

â{2,6}$

“aa”, “aaa”, “aaaaaa”

“a”, “”, “aaaaaaa”

(e|o){2}

“koe”, “booom”, “veel” “kol”, “omo”, “vele”

e|(o{2})

“nest”, “booom”, “koe” “kol”

(a[0-9]*){2}

“aa”, “ba45a”, “a3a0”

“abacus”, “a3b8a0”

8

Constructs: advanced
Regex

String

First match

Greedy

<.+>

“This is a first test”

“first”

Lazy

<.+?>

“This is a first test”

“”

Greedy/lazy repetition
Regex

Strings that match

Groups

^(a|b)c(d)$

“acd”, “bcd”

0:”acd” 1:”a” 2:”d”

^(?:a|b)c(d)$

same as above

0:”acd” 1:”d”

^(?<name>a|b)c(d)$

same as above

0:”acd” 1:”d” name:”a”

Grouping
9

Regex

Strings that match


^([a-c])x1x1

“axaxa”, “bxbxbyyyy”

“axaxb”, “bxaxc”

<(b)><(i)>.*?</2></1>

“bla”

“bla”

Backreferences: inside regex
Regex

Strings that match Replace pattern Result strings

^(var)(1|2)$

“var1”, “var2”

1iable

“variable”

^(a|b)c(d|e)

“acd”, “bcd”

2XXX1

“dXXXa”, “dXXXb”

Backreferences: find and replace
10

Regex

Strings that match


([a-b](?=x))

“blaax”, “bxa”

“ab”, “bacx”

((?<=x)[a-b])

“yyxa”, “bxb”

“ab”, “aax”

([a-b](?!x))

“bla”, “a”, “bxa”

“bxc”, “ax”

((?<!x)[a-b])

“ral”, “dbx”, “bxa”

“xa”, “lxb”

Look ahead/behind

11

Regexes in C#
• using System.Text.RegularExpressions
• Regex reg = new Regex(“a(b|c)d”);
–
–
–
–
–

reg.IsMatch(“abd”);
reg.Matches(“abdEacd”);
reg.Groups(“abdEabdC”);
reg.Split(“abdEabdC”);
reg.Replace(“abdEabdC”, “X”);

true
2 Matches
2 Groups per match
2 Strings (“E” and “C”)
“XEXC”

• Single/Multiline options: is linebreak(n)
special character

12

Regexes in editors
• Notepad++

14

Regexes in Linux
• grep, sed, find, vi

15

Pros and cons
• Advantages
–
–
–
–
–

Very flexible
Fast processing
Language independent
A lot of work in a single line of code
Often simpler than ‘substring+indexes’ approach

• Disadvantages
– Hard to read, for example ‘?’ has three meanings depending on
context
– Hard to debug: no info given when no match
– Compilation only at runtime
– Typos are very easily made (e.g. forget escape character)

16

Conclusions
“Some people, when confronted with a
problem, think ‘I know, I'll use regular
expressions.’ Now they have two problems.”
Jamie Zawinski
Don’t overuse it!
17

Conclusions
•
•
•
•

Very handy tool for string matching and replacing
Built-in support in most programming languages
Support in/for multiple applications
More info
– http://www.regular-expressions.info/
– http://msdn.microsoft.com/en-us/library/az24scfc
%28v=vs.110%29.aspx

• Fun
– http://regex.alf.nu/
– http://www.i-programmer.info/news/144-graphics-and-games/5450can-you-do-the-regular-expression-crossword.html
18

Regular Expressions

More Related Content

What's hot

Viewers also liked

Similar to Regular Expressions

Recently uploaded

Regular Expressions