SlideShare a Scribd company logo
Regular Expressions
      Redux
Scope

• medium to advanced
• 30 minutes
• performance / backtracking irrelevant
• no compatibility charts (yet)
TOC

• basic matching, quantifiers
• character classes, types, properties, anchors
• groups, options, replace string
• look-ahead/behind
• subexpressions
RE overview
RE overview

              match “foo”           replace with “bar”
  Perl        /foo/     (on $_)        s/foo/bar/ (on $_)

Javascript            /foo/       “foolish”.replace(/foo/, “bar”)

   Vi                 /foo/                 :s/foo/bar/

TextMate      ⌘-F, Find: foo       ⌘-F Find: foo, Replace: bar
RE overview

              match “foo”           replace with “bar”
  Perl        /foo/     (on $_)        s/foo/bar/ (on $_)

Javascript            /foo/       “foolish”.replace(/foo/, “bar”)

   Vi                 /foo/                 :s/foo/bar/

TextMate      ⌘-F, Find: foo       ⌘-F Find: foo, Replace: bar
RE overview

              match “foo”           replace with “bar”
  Perl        /foo/     (on $_)        s/foo/bar/ (on $_)

Javascript            /foo/       “foolish”.replace(/foo/, “bar”)

   Vi                 /foo/                 :s/foo/bar/

TextMate      ⌘-F, Find: foo       ⌘-F Find: foo, Replace: bar
Quantifiers
Quantifiers
• classic greedy: ?, *, +
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
  •   ? == {0,1}
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
  •   ? == {0,1}

  •   * == {0,}
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
  •   ? == {0,1}

  •   * == {0,}

  •   + == {1,}
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
  •   ? == {0,1}

  •   * == {0,}

  •   + == {1,}

• non-greedy: ??, *?, +?, {5,7}?
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

              /reveal(.*)plain/
             /reveal(.*?)plain/
                  /t.{2,3}t/
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

              /reveal(.*)plain/
             /reveal(.*?)plain/
                  /t.{2,3}t/
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

              /reveal(.*)plain/
             /reveal(.*?)plain/
                  /t.{2,3}t/
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

              /reveal(.*)plain/
             /reveal(.*?)plain/
                  /t.{2,3}t/
Character Classes /
    Properties
Character Classes /
      Properties
• [0-9a-z]   (classes)
Character Classes /
      Properties
• [0-9a-z]     (classes)
 •   +420[0-9]{9} = simplified czech phone nr.
Character Classes /
      Properties
• [0-9a-z]      (classes)
 •   +420[0-9]{9} = simplified czech phone nr.

 •   don’t: [A-z0-]
Character Classes /
      Properties
• [0-9a-z]       (classes)
  •   +420[0-9]{9} = simplified czech phone nr.

  •   don’t: [A-z0-]

• [a-z&&[^j-n]] == [a-io-z]
Character Classes /
      Properties
• [0-9a-z]       (classes)
  •   +420[0-9]{9} = simplified czech phone nr.

  •   don’t: [A-z0-]

• [a-z&&[^j-n]] == [a-io-z]
• p{Upper} (properties)
Character Classes /
      Properties
• [0-9a-z]       (classes)
  •   +420[0-9]{9} = simplified czech phone nr.

  •   don’t: [A-z0-]

• [a-z&&[^j-n]] == [a-io-z]
• p{Upper} (properties)
  •   works great on Unicode text (Latin,Katakana)
Character Classes /
      Properties
• [0-9a-z]       (classes)
  •   +420[0-9]{9} = simplified czech phone nr.

  •   don’t: [A-z0-]

• [a-z&&[^j-n]] == [a-io-z]
• p{Upper} (properties)
  •   works great on Unicode text (Latin,Katakana)

• [:alnum:], [:^space:] (POSIX bracket)
Character Types
Character Types
• . == anything (apart from newline)
Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
  •   more in unicode
Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
  •   more in unicode

• w == word char == cca [0-9a-zA-Z_]
  •   is complicated in unicode
Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
  •   more in unicode

• w == word char == cca [0-9a-zA-Z_]
  •   is complicated in unicode

• d == digit == [0-9]
  •   h == hexadecimal digit == [0-9a-fA-F]
Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
  •   more in unicode

• w == word char == cca [0-9a-zA-Z_]
  •   is complicated in unicode

• d == digit == [0-9]
  •   h == hexadecimal digit == [0-9a-fA-F]

• SWD == [^s][^w][^d]
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

           /b[w&&[^aA]]+b/
              /W{2,}w+b/
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

           /b[w&&[^aA]]+b/
              /W{2,}w+b/
Anchors
Anchors

• ^ - begining (line, string)
Anchors

• ^ - begining (line, string)
• $ - end (line, string)
Anchors

• ^ - begining (line, string)
• $ - end (line, string)
• b - word boundary ~ wW (almost)
 •   b.{5}b != Ww{5}W
Anchors

• ^ - begining (line, string)
• $ - end (line, string)
• b - word boundary ~ wW (almost)
 •   b.{5}b != Ww{5}W

• zero width!
Options
Options
• /foo/imsx
 •   i - case insensitive

 •   m - multiline (^,$ represent start of string/file)

 •   s - single line (. matches newlines)

 •   x - extended!

 •   g - global
Options
• /foo/imsx
  •   i - case insensitive

  •   m - multiline (^,$ represent start of string/file)

  •   s - single line (. matches newlines)

  •   x - extended!

  •   g - global

• can be written inline
  •   (?imsx-imsx)

  •   (?imsx-imsx:...)
Options
• /foo/imsx
  •   i - case insensitive

  •   m - multiline (^,$ represent start of string/file)

  •   s - single line (. matches newlines)

  •   x - extended!

  •   g - global                      (?x-i)
                                         #this is cool
• can be written inline                  (
                                            foo #my important value
  •                                         | #don't forget the alternative
      (?imsx-imsx)
                                            bar
  •                                      ) # result equals to (foo|bar)
      (?imsx-imsx:...)
Groups/Replacing
Groups/Replacing
• (...) - matched group
Groups/Replacing
• (...) - matched group
• $1 - $9
  •   alternatively 1 - 9 (not recommended)
Groups/Replacing
• (...) - matched group
• $1 - $9
  •   alternatively 1 - 9 (not recommended)

• nested groups ordered by left bracket
Groups/Replacing
• (...) - matched group
• $1 - $9
  •   alternatively 1 - 9 (not recommended)

• nested groups ordered by left bracket
• (?:...) - non-captured group
  •   useful for (?:foo)+ or (?:foo|bar)
Example
quot;foobarmanquot;.replace(
  /(?:f)((o)+)(bar)|(baz|man)/g,
  '$1, $2, $3, $4, $5')
Example
quot;foobarmanquot;.replace(
  /(?:f)((o)+)(bar)|(baz|man)/g,
  '$1, $2, $3, $4, $5')

    • foobar
      •   1 -- oo

      •   2 -- o

      •   3 -- bar

      •   4 --
Example
quot;foobarmanquot;.replace(
  /(?:f)((o)+)(bar)|(baz|man)/g,
  '$1, $2, $3, $4, $5')

    • foobar                       • man
      •                             •
          1 -- oo                       1 --

      •                             •
          2 -- o                        2 --

      •                             •
          3 -- bar                      3 --

      •                             •
          4 --                          4 -- man
Look-ahead/behind
• defines custom zero-width anchors
Look-ahead/behind
• defines custom zero-width anchors
                   positive negative

          ahead     (?=...)   (?!...)

          behind   (?<=...)   (?<!...)
Example

zdenek@gooddata.com
   /.*?@gooddata/


zdenek@gooddata.com
 /.*?(?=@gooddata)/
Recursive RE

• very important!
 •   quote & bracket matching

 •   technically not part of regular grammar

• two styles
 •   g<name> or g<n> - TextMate

 •   (?R) - Perl
Example
(?x:

 ( # match the initial opening parenthesis

 # Now make a named group 'balanced' which
     # matches a balanced substring.

 (?<balanced>

 
 [^()] # A balanced substring is either something
             # that is not a parenthesis:

 
 | # …or a parenthesised string:

 
 ( # A parenthesised string begins with an opening parenthesis

 
 
 g<balanced>* # …followed by a sequence of balanced substrings

 
 ) # …and ends with a closing parenthesis

 )* # Look for a sequence of balanced substrings

 ) # Finally, the outer closing parenthesis
)
Example
(?x:

 ( # match the initial opening parenthesis

 # Now make a named group 'balanced' which
     # matches a balanced substring.

 (?<balanced>

 
 [^()] # A balanced substring is either something
             # that is not a parenthesis:

 
 | # …or a parenthesised string:

 
 ( # A parenthesised string begins with an opening parenthesis

 
 
 g<balanced>* # …followed by a sequence of balanced substrings

 
 ) # …and ends with a closing parenthesis

 )* # Look for a sequence of balanced substrings

 ) # Finally, the outer closing parenthesis
)

or: (([^()]|(?R))*)

More Related Content

Viewers also liked

Brno Perl Mongers 28.5.2015 - Perl family by mj41
Brno Perl Mongers 28.5.2015 - Perl family by mj41Brno Perl Mongers 28.5.2015 - Perl family by mj41
Brno Perl Mongers 28.5.2015 - Perl family by mj41
Michal Jurosz
 
Budoucnost Web Aplikaci
Budoucnost Web AplikaciBudoucnost Web Aplikaci
Budoucnost Web AplikaciJakub Nesetril
 
Startup Accelerators
Startup AcceleratorsStartup Accelerators
Startup Accelerators
Jakub Nesetril
 
Harmony in API Design
Harmony in API DesignHarmony in API Design
Harmony in API Design
Jakub Nesetril
 
Avoiding API Waterfalls
Avoiding API WaterfallsAvoiding API Waterfalls
Avoiding API WaterfallsJakub Nesetril
 
Consuming API description languages - Refract & Minim
Consuming API description languages - Refract & MinimConsuming API description languages - Refract & Minim
Consuming API description languages - Refract & Minim
Jakub Nesetril
 
NodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebNodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebJakub Nesetril
 
Introduction to GoodData BI PaaS
Introduction to GoodData BI PaaSIntroduction to GoodData BI PaaS
Introduction to GoodData BI PaaSJakub Nesetril
 
Art of Building APIs
Art of Building APIsArt of Building APIs
Art of Building APIs
Jakub Nesetril
 
REST API tools
REST API toolsREST API tools
REST API tools
Jakub Nesetril
 
Introduction to node.js
Introduction to node.jsIntroduction to node.js
Introduction to node.js
Jakub Nesetril
 
GoodData: One Stop Shop for Analytics
GoodData: One Stop Shop for AnalyticsGoodData: One Stop Shop for Analytics
GoodData: One Stop Shop for Analytics
Jakub Nesetril
 
Let's Have a Cup of CoffeeScript
Let's Have a Cup of CoffeeScriptLet's Have a Cup of CoffeeScript
Let's Have a Cup of CoffeeScript
Nicolás Sanguinetti
 
Node at Apiary.io
Node at Apiary.ioNode at Apiary.io
Node at Apiary.io
Jakub Nesetril
 
API Design Workflows
API Design WorkflowsAPI Design Workflows
API Design Workflows
Jakub Nesetril
 
Pda
PdaPda
Apiary
ApiaryApiary
Apiary
Suresh B
 
Real-time Web a NodeJS
Real-time Web a NodeJSReal-time Web a NodeJS
Real-time Web a NodeJSJakub Nesetril
 

Viewers also liked (20)

Brno Perl Mongers 28.5.2015 - Perl family by mj41
Brno Perl Mongers 28.5.2015 - Perl family by mj41Brno Perl Mongers 28.5.2015 - Perl family by mj41
Brno Perl Mongers 28.5.2015 - Perl family by mj41
 
Budoucnost Web Aplikaci
Budoucnost Web AplikaciBudoucnost Web Aplikaci
Budoucnost Web Aplikaci
 
Startup Accelerators
Startup AcceleratorsStartup Accelerators
Startup Accelerators
 
Harmony in API Design
Harmony in API DesignHarmony in API Design
Harmony in API Design
 
Avoiding API Waterfalls
Avoiding API WaterfallsAvoiding API Waterfalls
Avoiding API Waterfalls
 
Consuming API description languages - Refract & Minim
Consuming API description languages - Refract & MinimConsuming API description languages - Refract & Minim
Consuming API description languages - Refract & Minim
 
NodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebNodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time Web
 
Post-REST Manifesto
Post-REST ManifestoPost-REST Manifesto
Post-REST Manifesto
 
Introduction to GoodData BI PaaS
Introduction to GoodData BI PaaSIntroduction to GoodData BI PaaS
Introduction to GoodData BI PaaS
 
Art of Building APIs
Art of Building APIsArt of Building APIs
Art of Building APIs
 
REST API tools
REST API toolsREST API tools
REST API tools
 
Introduction to node.js
Introduction to node.jsIntroduction to node.js
Introduction to node.js
 
GoodData: One Stop Shop for Analytics
GoodData: One Stop Shop for AnalyticsGoodData: One Stop Shop for Analytics
GoodData: One Stop Shop for Analytics
 
Pushdown autometa
Pushdown autometaPushdown autometa
Pushdown autometa
 
Let's Have a Cup of CoffeeScript
Let's Have a Cup of CoffeeScriptLet's Have a Cup of CoffeeScript
Let's Have a Cup of CoffeeScript
 
Node at Apiary.io
Node at Apiary.ioNode at Apiary.io
Node at Apiary.io
 
API Design Workflows
API Design WorkflowsAPI Design Workflows
API Design Workflows
 
Pda
PdaPda
Pda
 
Apiary
ApiaryApiary
Apiary
 
Real-time Web a NodeJS
Real-time Web a NodeJSReal-time Web a NodeJS
Real-time Web a NodeJS
 

Similar to Advanced Regular Expressions Redux

Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to PerlSway Wang
 
Regular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondRegular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And Beyond
Max Shirshin
 
Perl Presentation
Perl PresentationPerl Presentation
Perl Presentation
Sopan Shewale
 
Regexp secrets
Regexp secretsRegexp secrets
Regexp secrets
Hiro Asari
 
Perl 5.10 on OSDC.tw 2009
Perl 5.10 on OSDC.tw 2009Perl 5.10 on OSDC.tw 2009
Perl 5.10 on OSDC.tw 2009
scweng
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressions
Ben Brumfield
 
Out with Regex, In with Tokens
Out with Regex, In with TokensOut with Regex, In with Tokens
Out with Regex, In with Tokens
scoates
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Ben Brumfield
 
Regular expressions-ada-2018
Regular expressions-ada-2018Regular expressions-ada-2018
Regular expressions-ada-2018
Emma Burrows
 
Erlang with Regexp Perl And Port
Erlang with Regexp Perl And PortErlang with Regexp Perl And Port
Erlang with Regexp Perl And Port
Keiichi Daiba
 
[Erlang LT] Regexp Perl And Port
[Erlang LT] Regexp Perl And Port[Erlang LT] Regexp Perl And Port
[Erlang LT] Regexp Perl And Port
Keiichi Daiba
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
Aslak Hellesøy
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
Aslak Hellesøy
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
Aslak Hellesøy
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...
Codemotion
 

Similar to Advanced Regular Expressions Redux (20)

Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
Regular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondRegular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And Beyond
 
Perl Presentation
Perl PresentationPerl Presentation
Perl Presentation
 
Lecture2 B
Lecture2 BLecture2 B
Lecture2 B
 
Regexp secrets
Regexp secretsRegexp secrets
Regexp secrets
 
Perl 5.10 on OSDC.tw 2009
Perl 5.10 on OSDC.tw 2009Perl 5.10 on OSDC.tw 2009
Perl 5.10 on OSDC.tw 2009
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressions
 
Out with Regex, In with Tokens
Out with Regex, In with TokensOut with Regex, In with Tokens
Out with Regex, In with Tokens
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013
 
Regular expressions-ada-2018
Regular expressions-ada-2018Regular expressions-ada-2018
Regular expressions-ada-2018
 
Erlang with Regexp Perl And Port
Erlang with Regexp Perl And PortErlang with Regexp Perl And Port
Erlang with Regexp Perl And Port
 
[Erlang LT] Regexp Perl And Port
[Erlang LT] Regexp Perl And Port[Erlang LT] Regexp Perl And Port
[Erlang LT] Regexp Perl And Port
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...
 

Recently uploaded

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

Advanced Regular Expressions Redux

  • 2. Scope • medium to advanced • 30 minutes • performance / backtracking irrelevant • no compatibility charts (yet)
  • 3. TOC • basic matching, quantifiers • character classes, types, properties, anchors • groups, options, replace string • look-ahead/behind • subexpressions
  • 5. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
  • 6. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
  • 7. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
  • 10. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5}
  • 11. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1}
  • 12. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,}
  • 13. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,} • + == {1,}
  • 14. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,} • + == {1,} • non-greedy: ??, *?, +?, {5,7}?
  • 15. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
  • 16. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
  • 17. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
  • 18. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
  • 19. Character Classes / Properties
  • 20. Character Classes / Properties • [0-9a-z] (classes)
  • 21. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr.
  • 22. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-]
  • 23. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z]
  • 24. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties)
  • 25. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties) • works great on Unicode text (Latin,Katakana)
  • 26. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties) • works great on Unicode text (Latin,Katakana) • [:alnum:], [:^space:] (POSIX bracket)
  • 28. Character Types • . == anything (apart from newline)
  • 29. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode
  • 30. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode
  • 31. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode • d == digit == [0-9] • h == hexadecimal digit == [0-9a-fA-F]
  • 32. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode • d == digit == [0-9] • h == hexadecimal digit == [0-9a-fA-F] • SWD == [^s][^w][^d]
  • 33. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /b[w&&[^aA]]+b/ /W{2,}w+b/
  • 34. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /b[w&&[^aA]]+b/ /W{2,}w+b/
  • 36. Anchors • ^ - begining (line, string)
  • 37. Anchors • ^ - begining (line, string) • $ - end (line, string)
  • 38. Anchors • ^ - begining (line, string) • $ - end (line, string) • b - word boundary ~ wW (almost) • b.{5}b != Ww{5}W
  • 39. Anchors • ^ - begining (line, string) • $ - end (line, string) • b - word boundary ~ wW (almost) • b.{5}b != Ww{5}W • zero width!
  • 41. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global
  • 42. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global • can be written inline • (?imsx-imsx) • (?imsx-imsx:...)
  • 43. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global (?x-i) #this is cool • can be written inline ( foo #my important value • | #don't forget the alternative (?imsx-imsx) bar • ) # result equals to (foo|bar) (?imsx-imsx:...)
  • 46. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended)
  • 47. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended) • nested groups ordered by left bracket
  • 48. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended) • nested groups ordered by left bracket • (?:...) - non-captured group • useful for (?:foo)+ or (?:foo|bar)
  • 50. Example quot;foobarmanquot;.replace( /(?:f)((o)+)(bar)|(baz|man)/g, '$1, $2, $3, $4, $5') • foobar • 1 -- oo • 2 -- o • 3 -- bar • 4 --
  • 51. Example quot;foobarmanquot;.replace( /(?:f)((o)+)(bar)|(baz|man)/g, '$1, $2, $3, $4, $5') • foobar • man • • 1 -- oo 1 -- • • 2 -- o 2 -- • • 3 -- bar 3 -- • • 4 -- 4 -- man
  • 53. Look-ahead/behind • defines custom zero-width anchors positive negative ahead (?=...) (?!...) behind (?<=...) (?<!...)
  • 54. Example zdenek@gooddata.com /.*?@gooddata/ zdenek@gooddata.com /.*?(?=@gooddata)/
  • 55. Recursive RE • very important! • quote & bracket matching • technically not part of regular grammar • two styles • g<name> or g<n> - TextMate • (?R) - Perl
  • 56. Example (?x: ( # match the initial opening parenthesis # Now make a named group 'balanced' which # matches a balanced substring. (?<balanced> [^()] # A balanced substring is either something # that is not a parenthesis: | # …or a parenthesised string: ( # A parenthesised string begins with an opening parenthesis g<balanced>* # …followed by a sequence of balanced substrings ) # …and ends with a closing parenthesis )* # Look for a sequence of balanced substrings ) # Finally, the outer closing parenthesis )
  • 57. Example (?x: ( # match the initial opening parenthesis # Now make a named group 'balanced' which # matches a balanced substring. (?<balanced> [^()] # A balanced substring is either something # that is not a parenthesis: | # …or a parenthesised string: ( # A parenthesised string begins with an opening parenthesis g<balanced>* # …followed by a sequence of balanced substrings ) # …and ends with a closing parenthesis )* # Look for a sequence of balanced substrings ) # Finally, the outer closing parenthesis ) or: (([^()]|(?R))*)

Editor's Notes

  1. escaping???
  2. escaping???
  3. escaping???
  4. examples! possessive (?+, *+, ++)
  5. examples! possessive (?+, *+, ++)
  6. examples! possessive (?+, *+, ++)
  7. examples! possessive (?+, *+, ++)
  8. examples! possessive (?+, *+, ++)
  9. examples! possessive (?+, *+, ++)
  10. unicode compat table!
  11. unicode compat table!
  12. unicode compat table!
  13. unicode compat table!
  14. unicode compat table!
  15. unicode compat table!
  16. unicode compat table!
  17. notice the space at the end, capital reverses
  18. notice the space at the end, capital reverses
  19. notice the space at the end, capital reverses
  20. notice the space at the end, capital reverses
  21. notice the space at the end, capital reverses
  22. how about /g??
  23. how about /g??
  24. how about /g??