@dankogai
my$talk=qr{((?:ir)?reg(?:ular )?
exp(?:ressions?)?)}i;
Table of Contents
• regexp? what is it?


• $supported_by ~~ @most_major_languages;


• but how (much)??


• Unicode support?


• assertions?


• modifiers?


• Irregular expressions


• qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))}


• use CPAN;


• Regexp::Assemble;


• Regexp::Common;


• (ir)?regular questions (?:from|by) the audience
regexp? what is it?
Mathematically speaking[*]
• The empty language Ø is a regular language.

• For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular
language.

• If A is a regular language, A* (Kleene star) is a regular language. Due to this,
the empty string language {ε} is also regular.

• If A and B are regular languages, then A ∪ B (union) and A • B (concatenation)
are regular languages.

• No other languages over Σ are regular.
regexp? what is it?
In our language
• 0 or more of… (quantifier)


• '' # empty string


• 'string' # any string


• '(?:string|文字列)' # any alteration of strings


• That's it!


• ? # {0,}


• + # {1,}


• [0-9] # (?:0|1|2|3|4|5|6|7|8|9)
regexp? what is it?
((?:ir)?reg(?:ular )?exp(?:ressions?)?)
Visualized by: regexper.com
regexp? what is it?
(?:[x00-x7F]|[xC2-xDF][x80-xBF]|xE0[xA0-xBF][x80-xBF]|[xE1-xECxEExEF][x80-xBF]{2}|xED[x80-x9F][x80-xBF]|
xF0[x90-xBF][x80-xBF]{2}|[xF1-xF3][x80-xBF]{3}|xF4[x80-x8F][x80-xBF]{2})
Exerpt from: https://www.w3.org/International/questions/qa-forms-utf-8
Visualized by: regexper.com
regexp? what is it?
(?:[+-]?)(?:0x[0-9a-fA-F]+(?:.[0-9a-fA-F]+)?(?:[pP][+-]?[0-9]+)|(?:[1-9][0-9]*)(?:.[0-9]+)?(?:[eE][+-]?[0-9]+)?|0(?:.0+|(?:.0+)?(?:[eE]
[+-]?[0-9]+))|(?:[Nn]a[Nn]|[Ii]nf(?:inity)?))
Exerpt from: https://github.com/dankogai/js-sion/blob/main/sion.ts
Visualized by: regexper.com
Irregular expressions
/^(11+?)1+$/ # is this a regular expression?
$ seq 2 100 | perl -nlE 'say $_ if (1x$_) !~ /^(11+?)1+$/'
Irregular expressions
/^(11+?)1+$/ # is this a regular expression?
$ seq 2 100 | perl -nlE 'say $_ if (1x$_) !~ /^(11+?)1+$/'
2
3
5
7
…
79
83
89
97
Irregular expressions
/^(11+?)1+$/ # is NOT EXACTLY a regular expression!
• The problem is 1


• It is the result of the preceding capture


• In other words, this expression is self-modifying.


• So it is not mathematically a regular expression


• Regexp ≠ Regular Expression


• Regexp ⊆ Regular Expression
Irregular expressions
qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))}
• Q: Can a regular expression match nested parentheses?


• A: No. But some regex engines allow you to do that.
Irregular expressions
qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))}
my $re = qr{(
[A-Za-z_]w*s*
(
(
(
(?:
(?>[^()]+)
|
(?2)
)*
)
)
)
)
}x;
Irregular expressions
qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))}
#!/usr/bin/env perl
use strict;
use warnings;
use feature ':all';
my $re = qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))};
my $str = '$result = a(b(c),d(e,f(g,g,g)))';
$str =~ $re;
say $1;
say $2;
say $3;
Unicode Support
What is a character?
• String is /.*/ but . =


• [x00-xff] # legacy world of bytes


• [u0000-uFFFF] # prematurely modern


• [u{0000}-u{10FFFF}] # correctly modern
Unicode Support
What is a character?
• String is /.*/ but . =


• [x00-xff] # Perl < 5.7


• [u0000-uFFFF] # Java(Script)?, Python2, …


• [u{0000}-u{10FFFF}] # Perl, Ruby, Python3, …
Unicode Support?
What will the following say?
$ perl -Mutf8 -MData::Dumper -E 
'my@m=("🦏🐪🐘🐍💎⚙" =~ /(.)/g); say Dumper([@m])'
Unicode Support?
What will the following say?
$ perl -Mutf8 -MData::Dumper -E 
'my@m=("🦏🐪🐘🐍💎⚙" =~ /(.)/g); say Dumper([@m])'
$VAR1 = [
"x{1f98f}",
"x{1f42a}",
"x{1f418}",
"x{1f40d}",
"x{1f48e}",
"x{2699}"
];
Unicode Support?
What will the following say?
$ node -e 
'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/g))'
Unicode Support?
What will the following say?
$ node -e 
'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/g))'
[
'', '', '', '',
'', '', '', '',
'', '', '⚙'
]
Unicode Support?
What will the following say?
$ node -e 
'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/ug))'
[ '🦏', '🐪', '🐘', '🐍', '💎', '⚙' ]
Unicode Support?
What will the following say?
$ perl -Mutf8 -MData::Dumper -E 
'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])'
Unicode Support?
What will the following say?
$ perl -Mutf8 -MData::Dumper -E 
'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])'
$VAR1 = [
"x{1f1ef}",
"x{1f1f5}",
"x{1f1fa}",
"x{1f1e6}"
];
Unicode Support?
What will the following say?
$ perl -Mutf8 -MData::Dumper -E 
'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])'
$VAR1 = [
"x{1f1ef}", # REGIONAL INDICATOR SYMBOL LETTER J
"x{1f1f5}", # REGIONAL INDICATOR SYMBOL LETTER P
"x{1f1fa}", # REGIONAL INDICATOR SYMBOL LETTER U
"x{1f1e6}" # REGIONAL INDICATOR SYMBOL LETTER A
];
Unicode Support?
What will the following say?
$ perl -Mutf8 -MData::Dumper -E 
'my@m=("🇯🇵🇺🇦" =~ /(X)/g); say Dumper([@m])'
$VAR1 = [
"x{1f1ef}x{1f1f5}",
"x{1f1fa}x{1f1e6}"
];
Unicode Support?
What will the following say?
$ node -e 
'console.log("🇯🇵🇺🇦".match(/(.)/ug))'
Unicode Support?
What will the following say?
$ node -e 
'console.log("🇯🇵🇺🇦".match(/(.)/ug))'
[ '🇯', '🇵', '🇺', '🇦' ]
Unicode Support?
What will the following say?
$ node -e 
'console.log("🇯🇵🇺🇦".match(/(X)/ug))'
🙅 [ '🇯🇵','🇺🇦' ]
🙆 SyntaxError: Invalid regular expression: /(X)/: Invalid escape
at [eval]:1:24
at Script.runInThisContext (node:vm:129:12)
at Object.runInThisContext (node:vm:305:38)
at node:internal/process/execution:75:19
at [eval]-wrapper:6:22
at evalScript (node:internal/process/execution:74:60)
at node:internal/main/eval_string:27:3
🤦
Unicode Support?
Grapheme Cluster
• Defined in:


• https://unicode.org/reports/tr29/


• X is supported by:


• 🐘 PHP


• 🐪 Perl


• 💎 Ruby


• Not yet supported by:


• 🦏 JavaScript


• 🐍 Python
use CPAN
Regexp::Common
$ perl -MRegexp::Common -E 'say $RE{net}{IPv6}'
use CPAN
Regexp::Common
$perl -MRegexp::Common -E 'say $RE{net}{IPv6}'
(?:(?|(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-
fA-F]{1,4})|(?::(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:):
(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]
{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:)(?:)(?:):(?:
[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:)(?:)(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:)(?:):(?:[0-9a-
fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-
fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:
[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]
{1,4}):(?:)(?:)(?:)(?:)(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]
{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-
F]{1,4}):(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-
fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-
F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-
fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)
(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-
fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-
fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):
(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]
{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):)))
use CPAN
Regexp::Assemble
$ egrep '^.{5}$' /usr/share/dict/words 
| perl -MRegexp::Assemble -nl 
-E 'BEGIN{$ra=Regexp::Assemble->new}' 
-E '$ra->add($_);' 
-E 'END{say $ra->re}'
Wrap↑
• (?:ir)?regular expressions


• Regexp ≠ Regular Expression


• Regexp ⊆ Regular Expression


• Definition of characters


• [x00-xff]


• [u0000-uFFFF]


• [u{0000}-u{10FFFF}]


• (?:un)?availability of X


• Using perl? use CPAN!
BTW
Bible, an obsolete
• 3rd edition: August 8,
2006


• Too old especially for
JS
Thank you
🙇
Questions and answers
answer($_) foreach (/($questions)/sg);

my$talk=qr{((?:ir)?reg(?:ular )?exp(?:ressions?)?)}i;

  • 1.
  • 2.
    Table of Contents •regexp? what is it? • $supported_by ~~ @most_major_languages; • but how (much)?? • Unicode support? • assertions? • modifiers? • Irregular expressions • qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))} • use CPAN; • Regexp::Assemble; • Regexp::Common; • (ir)?regular questions (?:from|by) the audience
  • 3.
    regexp? what isit? Mathematically speaking[*] • The empty language Ø is a regular language. • For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular language. • If A is a regular language, A* (Kleene star) is a regular language. Due to this, the empty string language {ε} is also regular. • If A and B are regular languages, then A ∪ B (union) and A • B (concatenation) are regular languages. • No other languages over Σ are regular.
  • 4.
    regexp? what isit? In our language • 0 or more of… (quantifier) • '' # empty string • 'string' # any string • '(?:string|文字列)' # any alteration of strings • That's it! • ? # {0,} • + # {1,} • [0-9] # (?:0|1|2|3|4|5|6|7|8|9)
  • 5.
    regexp? what isit? ((?:ir)?reg(?:ular )?exp(?:ressions?)?) Visualized by: regexper.com
  • 6.
    regexp? what isit? (?:[x00-x7F]|[xC2-xDF][x80-xBF]|xE0[xA0-xBF][x80-xBF]|[xE1-xECxEExEF][x80-xBF]{2}|xED[x80-x9F][x80-xBF]| xF0[x90-xBF][x80-xBF]{2}|[xF1-xF3][x80-xBF]{3}|xF4[x80-x8F][x80-xBF]{2}) Exerpt from: https://www.w3.org/International/questions/qa-forms-utf-8 Visualized by: regexper.com
  • 7.
    regexp? what isit? (?:[+-]?)(?:0x[0-9a-fA-F]+(?:.[0-9a-fA-F]+)?(?:[pP][+-]?[0-9]+)|(?:[1-9][0-9]*)(?:.[0-9]+)?(?:[eE][+-]?[0-9]+)?|0(?:.0+|(?:.0+)?(?:[eE] [+-]?[0-9]+))|(?:[Nn]a[Nn]|[Ii]nf(?:inity)?)) Exerpt from: https://github.com/dankogai/js-sion/blob/main/sion.ts Visualized by: regexper.com
  • 8.
    Irregular expressions /^(11+?)1+$/ #is this a regular expression? $ seq 2 100 | perl -nlE 'say $_ if (1x$_) !~ /^(11+?)1+$/'
  • 9.
    Irregular expressions /^(11+?)1+$/ #is this a regular expression? $ seq 2 100 | perl -nlE 'say $_ if (1x$_) !~ /^(11+?)1+$/' 2 3 5 7 … 79 83 89 97
  • 10.
    Irregular expressions /^(11+?)1+$/ #is NOT EXACTLY a regular expression! • The problem is 1 • It is the result of the preceding capture • In other words, this expression is self-modifying. • So it is not mathematically a regular expression • Regexp ≠ Regular Expression • Regexp ⊆ Regular Expression
  • 11.
    Irregular expressions qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))} • Q:Can a regular expression match nested parentheses? • A: No. But some regex engines allow you to do that.
  • 12.
    Irregular expressions qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))} my $re= qr{( [A-Za-z_]w*s* ( ( ( (?: (?>[^()]+) | (?2) )* ) ) ) ) }x;
  • 13.
    Irregular expressions qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))} #!/usr/bin/env perl usestrict; use warnings; use feature ':all'; my $re = qr{([A-Za-z_]w*s*((((?:(?>[^()]+)|(?2))*))))}; my $str = '$result = a(b(c),d(e,f(g,g,g)))'; $str =~ $re; say $1; say $2; say $3;
  • 14.
    Unicode Support What isa character? • String is /.*/ but . = • [x00-xff] # legacy world of bytes • [u0000-uFFFF] # prematurely modern • [u{0000}-u{10FFFF}] # correctly modern
  • 15.
    Unicode Support What isa character? • String is /.*/ but . = • [x00-xff] # Perl < 5.7 • [u0000-uFFFF] # Java(Script)?, Python2, … • [u{0000}-u{10FFFF}] # Perl, Ruby, Python3, …
  • 16.
    Unicode Support? What willthe following say? $ perl -Mutf8 -MData::Dumper -E 'my@m=("🦏🐪🐘🐍💎⚙" =~ /(.)/g); say Dumper([@m])'
  • 17.
    Unicode Support? What willthe following say? $ perl -Mutf8 -MData::Dumper -E 'my@m=("🦏🐪🐘🐍💎⚙" =~ /(.)/g); say Dumper([@m])' $VAR1 = [ "x{1f98f}", "x{1f42a}", "x{1f418}", "x{1f40d}", "x{1f48e}", "x{2699}" ];
  • 18.
    Unicode Support? What willthe following say? $ node -e 'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/g))'
  • 19.
    Unicode Support? What willthe following say? $ node -e 'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/g))' [ '', '', '', '', '', '', '', '', '', '', '⚙' ]
  • 20.
    Unicode Support? What willthe following say? $ node -e 'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/ug))' [ '🦏', '🐪', '🐘', '🐍', '💎', '⚙' ]
  • 21.
    Unicode Support? What willthe following say? $ perl -Mutf8 -MData::Dumper -E 'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])'
  • 22.
    Unicode Support? What willthe following say? $ perl -Mutf8 -MData::Dumper -E 'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])' $VAR1 = [ "x{1f1ef}", "x{1f1f5}", "x{1f1fa}", "x{1f1e6}" ];
  • 23.
    Unicode Support? What willthe following say? $ perl -Mutf8 -MData::Dumper -E 'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])' $VAR1 = [ "x{1f1ef}", # REGIONAL INDICATOR SYMBOL LETTER J "x{1f1f5}", # REGIONAL INDICATOR SYMBOL LETTER P "x{1f1fa}", # REGIONAL INDICATOR SYMBOL LETTER U "x{1f1e6}" # REGIONAL INDICATOR SYMBOL LETTER A ];
  • 24.
    Unicode Support? What willthe following say? $ perl -Mutf8 -MData::Dumper -E 'my@m=("🇯🇵🇺🇦" =~ /(X)/g); say Dumper([@m])' $VAR1 = [ "x{1f1ef}x{1f1f5}", "x{1f1fa}x{1f1e6}" ];
  • 25.
    Unicode Support? What willthe following say? $ node -e 'console.log("🇯🇵🇺🇦".match(/(.)/ug))'
  • 26.
    Unicode Support? What willthe following say? $ node -e 'console.log("🇯🇵🇺🇦".match(/(.)/ug))' [ '🇯', '🇵', '🇺', '🇦' ]
  • 27.
    Unicode Support? What willthe following say? $ node -e 'console.log("🇯🇵🇺🇦".match(/(X)/ug))' 🙅 [ '🇯🇵','🇺🇦' ] 🙆 SyntaxError: Invalid regular expression: /(X)/: Invalid escape at [eval]:1:24 at Script.runInThisContext (node:vm:129:12) at Object.runInThisContext (node:vm:305:38) at node:internal/process/execution:75:19 at [eval]-wrapper:6:22 at evalScript (node:internal/process/execution:74:60) at node:internal/main/eval_string:27:3
  • 28.
  • 29.
    Unicode Support? Grapheme Cluster •Defined in: • https://unicode.org/reports/tr29/ • X is supported by: • 🐘 PHP • 🐪 Perl • 💎 Ruby • Not yet supported by: • 🦏 JavaScript • 🐍 Python
  • 30.
    use CPAN Regexp::Common $ perl-MRegexp::Common -E 'say $RE{net}{IPv6}'
  • 31.
    use CPAN Regexp::Common $perl -MRegexp::Common-E 'say $RE{net}{IPv6}' (?:(?|(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a- fA-F]{1,4})|(?::(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:): (?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F] {1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:)(?:)(?:):(?: [0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?::(?:)(?:)(?:)(?:)(?:)(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:)(?:):(?:[0-9a- fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a- fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?: [0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F] {1,4}):(?:)(?:)(?:)(?:)(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F] {1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA- F]{1,4}):(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a- fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA- F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a- fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:) (?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a- fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):(?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a- fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:): (?:[0-9a-fA-F]{1,4}))|(?:(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:)(?:):)|(?:(?:[0-9a-fA-F] {1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:[0-9a-fA-F]{1,4}):(?:)(?:):)))
  • 32.
    use CPAN Regexp::Assemble $ egrep'^.{5}$' /usr/share/dict/words | perl -MRegexp::Assemble -nl -E 'BEGIN{$ra=Regexp::Assemble->new}' -E '$ra->add($_);' -E 'END{say $ra->re}'
  • 33.
    Wrap↑ • (?:ir)?regular expressions •Regexp ≠ Regular Expression • Regexp ⊆ Regular Expression • Definition of characters • [x00-xff] • [u0000-uFFFF] • [u{0000}-u{10FFFF}] • (?:un)?availability of X • Using perl? use CPAN!
  • 34.
    BTW Bible, an obsolete •3rd edition: August 8, 2006 • Too old especially for JS
  • 35.
  • 36.
    Questions and answers answer($_)foreach (/($questions)/sg);