Regular Expressions:
JavaScript And Beyond

Max Shirshin

Frontend Team Lead
deltamethod
Introduction
Types of regular expressions
• POSIX (BRE, ERE)
• PCRE = Perl-Compatible Regular Expressions
From the JavaScript language specification:

"The form and functionality of regular expressions
is modelled after the regular expression facility in
the Perl 5 programming language".

4
JS syntax (overview only)
var re = /^foo/;
 
 

 
 

5
JS syntax (overview only)
var re = /^foo/;
// boolean
re.test('string');
 
 

6
JS syntax (overview only)
var re = /^foo/;
// boolean
re.test('string');
// null or Array
re.exec('string');

7
Regular expressions consist of...
●

Tokens

— common characters
— special characters (metacharacters)
●

Operations

— quantification
— enumeration
— grouping
8
Tokens and metacharacters
Any character
/./.test('foo');

// true

/./.test('rn')

// false

 
 
 
 

10
Any character
/./.test('foo');

// true

/./.test('rn')

// false

What do you need instead:
/[sS]/ for JavaScript
or
/./s (works in Perl/PCRE, not in JS)

11
String boundaries
>>> /^something$/.test('something')
true
 
 

 

 

12
String boundaries
>>> /^something$/.test('something')
true
>>> /^something$/.test('somethingnbad')
false
 

 

13
String boundaries
>>> /^something$/.test('something')
true
>>> /^something$/.test('somethingnbad')
false
>>> /^something$/m.test('somethingnbad')
true

14
Word boundaries
>>> /ba/.test('alabama)
true
 
 

 
 

 
 

15
Word boundaries
>>> /ba/.test('alabama)
true
>>> /ab/.test('alabama')
true
 
 

 
 

16
Word boundaries
>>> /ba/.test('alabama)
true
>>> /ab/.test('alabama')
true
>>> /ab/.test('naïve')
true
 
 

17
Word boundaries
>>> /ba/.test('alabama)
true
>>> /ab/.test('alabama')
true
>>> /ab/.test('naïve')
true
not a word boundary
/Ba/.test('alabama');
18
Character classes
Whitespace
/s/ (inverted version: /S/)
 
 

 
 

 
 

20

 
Whitespace
/s/ (inverted version: /S/)
FF:
t
u00a0
u2003
u2009

n
v
u1680 u180e
u2004 u2005
u200a u2028

Chrome, IE 9:
as in FF plus ufeff

f
r
u2000 u2001
u2006 u2007
u2029 u202f

IE 7, 8 :-(
only:
t n v f r u0020
21

u0020
u2002
u2008
u205f u3000
Alphanumeric characters
/d/ ~ digits from 0 to 9
/w/ ~ Latin letters, digits, underscore
Does not work for Cyrillic, Greek etc.

Inverted forms:
/D/ ~ anything but digits
/W/ ~ anything but alphanumeric characters

22
Custom character classes
Example:
/[abc123]/
 
 

 
 

 

23
Custom character classes
Example:
/[abc123]/
Metacharacters and ranges supported:
/[A-Fd]/
 
 

 

24
Custom character classes
Example:
/[abc123]/
Metacharacters and ranges supported:
/[A-Fd]/
More than one range is okay:
/[a-cG-M0-7]/
 

25
Custom character classes
Example:
/[abc123]/
Metacharacters and ranges supported:
/[A-Fd]/
More than one range is okay:
/[a-cG-M0-7]/
IMPORTANT: ranges come from Unicode, not
from national alphabets!
26
Custom character classes
"dot" means just dot!
/[.]/.test('anything') // false
 
 

27
Custom character classes
"dot" means just dot!
/[.]/.test('anything') // false
adding  ] /[]-]/

28
Inverted character classes
anything except a, b, c:
/[^abc]/
^ as a character:
/[abc^]/

29
Inverted character classes
/[^]/
matches ANY character;
a nice alternative to /[sS]/

30
Inverted character classes
/[^]/
matches ANY character;
could be
a nice alternative to /[sS]/

31
Inverted character classes
/[^]/
matches ANY character;
could be
a nice alternative to /[sS]/
Chrome, FF:
>>> /([^])/.exec('a');
['a', 'a']
32
Inverted character classes
/[^]/
matches ANY character;
could be
a nice alternative to /[sS]/
IE:
>>> /([^])/.exec('a');
['a', '']
33
Inverted character classes
/[^]/
matches ANY character;
could be
a nice alternative to /[sS]/
IE:
>>> /([sS])/.exec('a');
['a', 'a']
34
Quantifiers
Zero or more, one or more
/bo*/.test('b') // true
 

 

36
Zero or more, one or more
/bo*/.test('b') // true
/.*/.test('')
 

37

// true
Zero or more, one or more
/bo*/.test('b') // true
/.*/.test('')

// true

/bo+/.test('b') // false

38
Zero or one

/colou?r/.test('color');
/colou?r/.test('colour');

39
How many?
/bo{7}/
 

 

 
 

40

exactly 7
How many?
/bo{7}/

exactly 7

/bo{2,5}/

from 2 to 5, x < y

 

 
 

41
How many?
/bo{7}/

exactly 7

/bo{2,5}/

from 2 to 5, x < y

/bo{5,}/

5 or more

 
 

42
How many?
/bo{7}/

exactly 7

/bo{2,5}/

from 2 to 5, x < y

/bo{5,}/

5 or more

This does not work in JS:
/b{,5}/.test('bbbbb')
43
Greedy quantifiers
var r = /a+/.exec('aaaaa');
 
 

44
Greedy quantifiers
var r = /a+/.exec('aaaaa');
>>> r[0]
 

45
Greedy quantifiers
var r = /a+/.exec('aaaaa');
>>> r[0]
"aaaaa"

46
Lazy quantifiers
var r = /a+?/.exec('aaaaa');
 
 

 
 
 

47
Lazy quantifiers
var r = /a+?/.exec('aaaaa');
>>> r[0]
 

 
 
 

48
Lazy quantifiers
var r = /a+?/.exec('aaaaa');
>>> r[0]
"a"
 
 
 

49
Lazy quantifiers
var r = /a+?/.exec('aaaaa');
>>> r[0]
"a"
r = /a*?/.exec('aaaaa');
 
 

50
Lazy quantifiers
var r = /a+?/.exec('aaaaa');
>>> r[0]
"a"
r = /a*?/.exec('aaaaa');
>>> r[0]
 

51
Lazy quantifiers
var r = /a+?/.exec('aaaaa');
>>> r[0]
"a"
r = /a*?/.exec('aaaaa');
>>> r[0]
""
52
Groups
Groups
capturing
/(boo)/.test("boo");
 
 

54
Groups
capturing
/(boo)/.test("boo");
non-capturing
/(?:boo)/.test("boo");

55
Grouping and the RegExp constructor
var result = /(bo)o+(b)/.exec('the booooob');
 
 
 
 
 

 
 
 
 

56
Grouping and the RegExp constructor
var result = /(bo)o+(b)/.exec('the booooob');
>>> RegExp.$1
"bo"
 
 
 

 
 
 
 

57
Grouping and the RegExp constructor
var result = /(bo)o+(b)/.exec('the booooob');
>>> RegExp.$1
"bo"
>>> RegExp.$2
"b"
 

 
 
 
 

58
Grouping and the RegExp constructor
var result = /(bo)o+(b)/.exec('the booooob');
>>> RegExp.$1
"bo"
>>> RegExp.$2
"b"
>>> RegExp.$9
""
 
 
 
 

59
Grouping and the RegExp constructor
var result = /(bo)o+(b)/.exec('the booooob');
>>> RegExp.$1
"bo"
>>> RegExp.$2
"b"
>>> RegExp.$9
""
>>> RegExp.$10
undefined
 
 

60
Grouping and the RegExp constructor
var result = /(bo)o+(b)/.exec('the booooob');
>>> RegExp.$1
"bo"
>>> RegExp.$2
"b"
>>> RegExp.$9
""
>>> RegExp.$10
undefined
>>> RegExp.$0
undefined
61
Numbering of capturing groups
/((foo) (b(a)r))/
 

 
 
 
62
Numbering of capturing groups
/((foo) (b(a)r))/
$1 (
 
 
 

63

)

foo bar
Numbering of capturing groups
/((foo) (b(a)r))/
$1 (
$2 (
 
 

64

)
)

foo bar
foo
Numbering of capturing groups
/((foo) (b(a)r))/
$1 (
$2 (
$3
 

65

)
)
(

)

foo bar
foo
bar
Numbering of capturing groups
/((foo) (b(a)r))/
$1 (
$2 (
$3
$4
66

)
)
(

)
( )

foo bar
foo
bar
a
Lookahead
var r = /best(?= match)/.exec('best match');
 
 

 
 

 
 

67
Lookahead
var r = /best(?= match)/.exec('best match');
>>> !!r
true
 
 

 
 

68
Lookahead
var r = /best(?= match)/.exec('best match');
>>> !!r
true
>>> r[0]
"best"
 
 

69
Lookahead
var r = /best(?= match)/.exec('best match');
>>> !!r
true
>>> r[0]
"best"
>>> /best(?! match)/.test('best match')
false

70
Lookbehind
NOT supported in JavaScript at all

/(?<=text)match/
positive lookbehind

/(?<!text)match/
negative lookbehind

71
Enumerations
Logical "or"
/red|green|blue light/
/(red|green|blue) light/
>>> /var a(;|$)/.test('var a')
true

73
Backreferences
true
/(red|green) apple is 1/.test('red apple is red')
true
/(red|green) apple is 1/.test('green apple is green')

74
Alternative character
represenations
Representing a character
x09 === t (not Unicode but ASCII/ANSI)
u20AC === € (in Unicode)
 

 
 

 
 

76
Representing a character
x09 === t (not Unicode but ASCII/ANSI)
u20AC === € (in Unicode)
backslash takes away special character
meaning:
/()/.test('()')
/n/.test('n')
 
 

77

// true
// true
Representing a character
x09 === t (not Unicode but ASCII/ANSI)
u20AC === € (in Unicode)
backslash takes away special character
meaning:
/()/.test('()')
/n/.test('n')

// true
// true

...or vice versa!
/f/.test('f') // false!
78
Flags
Regular expression flags
g i m s x y
 
 
 

 
 
 

80
Regular expression flags
g i m s x y
global match
 
 

 
 
 

81
Regular expression flags
g i m s x y
global match
ignore case
 

 
 
 

82
Regular expression flags
g i m s x y
global match
ignore case
multiline matching for ^ and $
 
 
 

83
Regular expression flags
g i m s x y
global match
ignore case
multiline matching for ^ and $
JavaScript does NOT provide support for:
string as single line
extend pattern

84
Regular expression flags
g i m s x y
global match
ignore case
multiline matching for ^ and $
Mozilla-only, non-standard:
sticky
Match only from the .lastIndex index (a
regexp instance property). Thus, ^ can
match at a predefined position.
85
Alternative syntax for flags
/(?i)foo/
/(?i-m)bar$/
/(?i-sm).x$/
/(?i)foo(?-i)bar/
Some implementations do NOT support flag
switching on-the-go.
In JS, flags are set for the whole regexp
instance and you can't change them.
86
RegExp in JavaScript
Methods
RegExp instances:
/regexp/.exec('string')
null or array ['whole match', $1, $2, ...]
/regexp/.test('string')
false or true
String instances:
'str'.match(/regexp/)
'str'.match('w{1,3}')
- same as /regexp/.exec if no 'g' flag used;
- array of all matches if 'g' flag used (internal
capturing groups ignored)
'str'.search(/regexp/)
'str'.search('w{1,3}')
first match index, or -1

88
Methods
String instances:
'str'.replace(/old/, 'new');
WARNING: special magic supported in the replacement string:
$$
inserts a dollar sign "$"
$&
substring that matches the regexp
$`
substring before $&
$'
substring after $&
$1, $2, $3 etc.:
string that matches n-th capturing group
'str'.replace(/(r)(e)gexp/g,
function(matched, $1, $2, offset, sourceString) {
// what should replace the matched part on this iteration?
return 'replacement';
});

89
RegExp injection
// BAD CODE
var re = new RegExp('^' + userInput + '$');
// ...
var userInput = '[abc]'; // oops!

// GOOD, DO IT AT HOME
RegExp.escape = function(text) {
return text.replace(/[-[]{}()*+?.,^$|#s]/g, "$&");
};
var re = new RegExp('^' + RegExp.escape(userInput) + '$');

90
Recommended reading
Online, just google it:
MDN Guide on Regular Expressions
The Book:

Mastering Regular Expressions
O'Reilly Media
Thank you!

Regular Expressions: JavaScript And Beyond