Regular Expressions: JavaScript And Beyond

  • 557 views
Uploaded on

Regular Expressions is a powerful tool for text and data processing. What kind of support do browsers provide for that? What are those little misconceptions that prevent people from using RE …

Regular Expressions is a powerful tool for text and data processing. What kind of support do browsers provide for that? What are those little misconceptions that prevent people from using RE effectively?

The talk gives an overview of the regular expression syntax and typical usage examples.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
557
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
13
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Regular Expressions: JavaScript And Beyond Max Shirshin Frontend Team Lead deltamethod
  • 2. Introduction
  • 3. Types of regular expressions • POSIX (BRE, ERE) • PCRE = Perl-Compatible Regular Expressions From the JavaScript language specification: "The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language". 4
  • 4. JS syntax (overview only) var re = /^foo/;         5
  • 5. JS syntax (overview only) var re = /^foo/; // boolean re.test('string');     6
  • 6. JS syntax (overview only) var re = /^foo/; // boolean re.test('string'); // null or Array re.exec('string'); 7
  • 7. Regular expressions consist of... ● Tokens — common characters — special characters (metacharacters) ● Operations — quantification — enumeration — grouping 8
  • 8. Tokens and metacharacters
  • 9. Any character /./.test('foo'); // true /./.test('rn') // false         10
  • 10. Any character /./.test('foo'); // true /./.test('rn') // false What do you need instead: /[sS]/ for JavaScript or /./s (works in Perl/PCRE, not in JS) 11
  • 11. String boundaries >>> /^something$/.test('something') true         12
  • 12. String boundaries >>> /^something$/.test('something') true >>> /^something$/.test('somethingnbad') false     13
  • 13. String boundaries >>> /^something$/.test('something') true >>> /^something$/.test('somethingnbad') false >>> /^something$/m.test('somethingnbad') true 14
  • 14. Word boundaries >>> /ba/.test('alabama) true             15
  • 15. Word boundaries >>> /ba/.test('alabama) true >>> /ab/.test('alabama') true         16
  • 16. Word boundaries >>> /ba/.test('alabama) true >>> /ab/.test('alabama') true >>> /ab/.test('naïve') true     17
  • 17. Word boundaries >>> /ba/.test('alabama) true >>> /ab/.test('alabama') true >>> /ab/.test('naïve') true not a word boundary /Ba/.test('alabama'); 18
  • 18. Character classes
  • 19. Whitespace /s/ (inverted version: /S/)             20  
  • 20. Whitespace /s/ (inverted version: /S/) FF: t u00a0 u2003 u2009 n v u1680 u180e u2004 u2005 u200a u2028 Chrome, IE 9: as in FF plus ufeff f r u2000 u2001 u2006 u2007 u2029 u202f IE 7, 8 :-( only: t n v f r u0020 21 u0020 u2002 u2008 u205f u3000
  • 21. Alphanumeric characters /d/ ~ digits from 0 to 9 /w/ ~ Latin letters, digits, underscore Does not work for Cyrillic, Greek etc. Inverted forms: /D/ ~ anything but digits /W/ ~ anything but alphanumeric characters 22
  • 22. Custom character classes Example: /[abc123]/           23
  • 23. Custom character classes Example: /[abc123]/ Metacharacters and ranges supported: /[A-Fd]/       24
  • 24. Custom character classes Example: /[abc123]/ Metacharacters and ranges supported: /[A-Fd]/ More than one range is okay: /[a-cG-M0-7]/   25
  • 25. Custom character classes Example: /[abc123]/ Metacharacters and ranges supported: /[A-Fd]/ More than one range is okay: /[a-cG-M0-7]/ IMPORTANT: ranges come from Unicode, not from national alphabets! 26
  • 26. Custom character classes "dot" means just dot! /[.]/.test('anything') // false     27
  • 27. Custom character classes "dot" means just dot! /[.]/.test('anything') // false adding ] /[]-]/ 28
  • 28. Inverted character classes anything except a, b, c: /[^abc]/ ^ as a character: /[abc^]/ 29
  • 29. Inverted character classes /[^]/ matches ANY character; a nice alternative to /[sS]/ 30
  • 30. Inverted character classes /[^]/ matches ANY character; could be a nice alternative to /[sS]/ 31
  • 31. Inverted character classes /[^]/ matches ANY character; could be a nice alternative to /[sS]/ Chrome, FF: >>> /([^])/.exec('a'); ['a', 'a'] 32
  • 32. Inverted character classes /[^]/ matches ANY character; could be a nice alternative to /[sS]/ IE: >>> /([^])/.exec('a'); ['a', ''] 33
  • 33. Inverted character classes /[^]/ matches ANY character; could be a nice alternative to /[sS]/ IE: >>> /([sS])/.exec('a'); ['a', 'a'] 34
  • 34. Quantifiers
  • 35. Zero or more, one or more /bo*/.test('b') // true     36
  • 36. Zero or more, one or more /bo*/.test('b') // true /.*/.test('')   37 // true
  • 37. Zero or more, one or more /bo*/.test('b') // true /.*/.test('') // true /bo+/.test('b') // false 38
  • 38. Zero or one /colou?r/.test('color'); /colou?r/.test('colour'); 39
  • 39. How many? /bo{7}/         40 exactly 7
  • 40. How many? /bo{7}/ exactly 7 /bo{2,5}/ from 2 to 5, x < y       41
  • 41. How many? /bo{7}/ exactly 7 /bo{2,5}/ from 2 to 5, x < y /bo{5,}/ 5 or more     42
  • 42. How many? /bo{7}/ exactly 7 /bo{2,5}/ from 2 to 5, x < y /bo{5,}/ 5 or more This does not work in JS: /b{,5}/.test('bbbbb') 43
  • 43. Greedy quantifiers var r = /a+/.exec('aaaaa');     44
  • 44. Greedy quantifiers var r = /a+/.exec('aaaaa'); >>> r[0]   45
  • 45. Greedy quantifiers var r = /a+/.exec('aaaaa'); >>> r[0] "aaaaa" 46
  • 46. Lazy quantifiers var r = /a+?/.exec('aaaaa');           47
  • 47. Lazy quantifiers var r = /a+?/.exec('aaaaa'); >>> r[0]         48
  • 48. Lazy quantifiers var r = /a+?/.exec('aaaaa'); >>> r[0] "a"       49
  • 49. Lazy quantifiers var r = /a+?/.exec('aaaaa'); >>> r[0] "a" r = /a*?/.exec('aaaaa');     50
  • 50. Lazy quantifiers var r = /a+?/.exec('aaaaa'); >>> r[0] "a" r = /a*?/.exec('aaaaa'); >>> r[0]   51
  • 51. Lazy quantifiers var r = /a+?/.exec('aaaaa'); >>> r[0] "a" r = /a*?/.exec('aaaaa'); >>> r[0] "" 52
  • 52. Groups
  • 53. Groups capturing /(boo)/.test("boo");     54
  • 54. Groups capturing /(boo)/.test("boo"); non-capturing /(?:boo)/.test("boo"); 55
  • 55. Grouping and the RegExp constructor var result = /(bo)o+(b)/.exec('the booooob');                   56
  • 56. Grouping and the RegExp constructor var result = /(bo)o+(b)/.exec('the booooob'); >>> RegExp.$1 "bo"               57
  • 57. Grouping and the RegExp constructor var result = /(bo)o+(b)/.exec('the booooob'); >>> RegExp.$1 "bo" >>> RegExp.$2 "b"           58
  • 58. Grouping and the RegExp constructor var result = /(bo)o+(b)/.exec('the booooob'); >>> RegExp.$1 "bo" >>> RegExp.$2 "b" >>> RegExp.$9 ""         59
  • 59. Grouping and the RegExp constructor var result = /(bo)o+(b)/.exec('the booooob'); >>> RegExp.$1 "bo" >>> RegExp.$2 "b" >>> RegExp.$9 "" >>> RegExp.$10 undefined     60
  • 60. Grouping and the RegExp constructor var result = /(bo)o+(b)/.exec('the booooob'); >>> RegExp.$1 "bo" >>> RegExp.$2 "b" >>> RegExp.$9 "" >>> RegExp.$10 undefined >>> RegExp.$0 undefined 61
  • 61. Numbering of capturing groups /((foo) (b(a)r))/         62
  • 62. Numbering of capturing groups /((foo) (b(a)r))/ $1 (       63 ) foo bar
  • 63. Numbering of capturing groups /((foo) (b(a)r))/ $1 ( $2 (     64 ) ) foo bar foo
  • 64. Numbering of capturing groups /((foo) (b(a)r))/ $1 ( $2 ( $3   65 ) ) ( ) foo bar foo bar
  • 65. Numbering of capturing groups /((foo) (b(a)r))/ $1 ( $2 ( $3 $4 66 ) ) ( ) ( ) foo bar foo bar a
  • 66. Lookahead var r = /best(?= match)/.exec('best match');             67
  • 67. Lookahead var r = /best(?= match)/.exec('best match'); >>> !!r true         68
  • 68. Lookahead var r = /best(?= match)/.exec('best match'); >>> !!r true >>> r[0] "best"     69
  • 69. Lookahead var r = /best(?= match)/.exec('best match'); >>> !!r true >>> r[0] "best" >>> /best(?! match)/.test('best match') false 70
  • 70. Lookbehind NOT supported in JavaScript at all /(?<=text)match/ positive lookbehind /(?<!text)match/ negative lookbehind 71
  • 71. Enumerations
  • 72. Logical "or" /red|green|blue light/ /(red|green|blue) light/ >>> /var a(;|$)/.test('var a') true 73
  • 73. Backreferences true /(red|green) apple is 1/.test('red apple is red') true /(red|green) apple is 1/.test('green apple is green') 74
  • 74. Alternative character represenations
  • 75. Representing a character x09 === t (not Unicode but ASCII/ANSI) u20AC === € (in Unicode)           76
  • 76. Representing a character x09 === t (not Unicode but ASCII/ANSI) u20AC === € (in Unicode) backslash takes away special character meaning: /()/.test('()') /n/.test('n')     77 // true // true
  • 77. Representing a character x09 === t (not Unicode but ASCII/ANSI) u20AC === € (in Unicode) backslash takes away special character meaning: /()/.test('()') /n/.test('n') // true // true ...or vice versa! /f/.test('f') // false! 78
  • 78. Flags
  • 79. Regular expression flags g i m s x y             80
  • 80. Regular expression flags g i m s x y global match           81
  • 81. Regular expression flags g i m s x y global match ignore case         82
  • 82. Regular expression flags g i m s x y global match ignore case multiline matching for ^ and $       83
  • 83. Regular expression flags g i m s x y global match ignore case multiline matching for ^ and $ JavaScript does NOT provide support for: string as single line extend pattern 84
  • 84. Regular expression flags g i m s x y global match ignore case multiline matching for ^ and $ Mozilla-only, non-standard: sticky Match only from the .lastIndex index (a regexp instance property). Thus, ^ can match at a predefined position. 85
  • 85. Alternative syntax for flags /(?i)foo/ /(?i-m)bar$/ /(?i-sm).x$/ /(?i)foo(?-i)bar/ Some implementations do NOT support flag switching on-the-go. In JS, flags are set for the whole regexp instance and you can't change them. 86
  • 86. RegExp in JavaScript
  • 87. Methods RegExp instances: /regexp/.exec('string') null or array ['whole match', $1, $2, ...] /regexp/.test('string') false or true String instances: 'str'.match(/regexp/) 'str'.match('w{1,3}') - same as /regexp/.exec if no 'g' flag used; - array of all matches if 'g' flag used (internal capturing groups ignored) 'str'.search(/regexp/) 'str'.search('w{1,3}') first match index, or -1 88
  • 88. Methods String instances: 'str'.replace(/old/, 'new'); WARNING: special magic supported in the replacement string: $$ inserts a dollar sign "$" $& substring that matches the regexp $` substring before $& $' substring after $& $1, $2, $3 etc.: string that matches n-th capturing group 'str'.replace(/(r)(e)gexp/g, function(matched, $1, $2, offset, sourceString) { // what should replace the matched part on this iteration? return 'replacement'; }); 89
  • 89. RegExp injection // BAD CODE var re = new RegExp('^' + userInput + '$'); // ... var userInput = '[abc]'; // oops! // GOOD, DO IT AT HOME RegExp.escape = function(text) { return text.replace(/[-[]{}()*+?.,^$|#s]/g, "$&"); }; var re = new RegExp('^' + RegExp.escape(userInput) + '$'); 90
  • 90. Recommended reading
  • 91. Online, just google it: MDN Guide on Regular Expressions The Book: Mastering Regular Expressions O'Reilly Media
  • 92. Thank you!