SlideShare a Scribd company logo
Regular Expressions
 The Black Magic of Programming
The Basics
The Fear Factor!
The Fear Factor!

For unknown reasons regular expressions
are deeply shrouded in mystery
The Fear Factor!

For unknown reasons regular expressions
are deeply shrouded in mystery
Many programmers outright fear them
The Fear Factor!

For unknown reasons regular expressions
are deeply shrouded in mystery
Many programmers outright fear them
  I stumped a room full of programmers
  in Tulsa by shouting out a two
  character expression
The Fear Factor!

For unknown reasons regular expressions
are deeply shrouded in mystery
Many programmers outright fear them
  I stumped a room full of programmers
  in Tulsa by shouting out a two
  character expression
I have know idea why this is
What is a Regex?
What is a Regex?

Regular expression is a very small
language for describing text
What is a Regex?

Regular expression is a very small
language for describing text
You can use them to dissect and change
textual data
What is a Regex?

Regular expression is a very small
language for describing text
You can use them to dissect and change
textual data
I think of them as a DSL for find and
replace operations
Why Learn Regular Expressions?
Why Learn Regular Expressions?

 Ruby leans heavily on regular expressions:
Why Learn Regular Expressions?

 Ruby leans heavily on regular expressions:
   Many text operations in Ruby are
   easiest with the right regex
Why Learn Regular Expressions?

 Ruby leans heavily on regular expressions:
   Many text operations in Ruby are
   easiest with the right regex
   Regular expressions are fast
Why Learn Regular Expressions?

 Ruby leans heavily on regular expressions:
   Many text operations in Ruby are
   easiest with the right regex
   Regular expressions are fast
   Regular expressions are encoding aware
Why Learn Regular Expressions?

 Ruby leans heavily on regular expressions:
   Many text operations in Ruby are
   easiest with the right regex
   Regular expressions are fast
   Regular expressions are encoding aware
 You can be the one scaring all the other
 programmers
Basic Regex Usage
Basic Regex Usage

Strings has methods
supporting:
Basic Regex Usage

Strings has methods
supporting:

   Find/Find All
Basic Regex Usage

Strings has methods
supporting:

   Find/Find All

   Replace/Replace All
Basic Regex Usage

Strings has methods
supporting:

   Find/Find All

   Replace/Replace All

Use sub!()/gsub!() to
modify a String in
place
Basic Regex Usage

Strings has methods      if "100" =~ /Ad+z/
supporting:                puts "This is a number."
                         end



   Find/Find All

   Replace/Replace All

Use sub!()/gsub!() to
modify a String in
place
Basic Regex Usage

Strings has methods             if "100" =~ /Ad+z/
supporting:                       puts "This is a number."
                                end


                         "Find all, words.".scan(/w+/) do |word|
   Find/Find All           puts word.downcase
                         end

                         year, month, day = "2008-09-04".scan(/d+/)
   Replace/Replace All

Use sub!()/gsub!() to
modify a String in
place
Basic Regex Usage

Strings has methods                if "100" =~ /Ad+z/
supporting:                          puts "This is a number."
                                   end


                            "Find all, words.".scan(/w+/) do |word|
   Find/Find All              puts word.downcase
                            end

                            year, month, day = "2008-09-04".scan(/d+/)
   Replace/Replace All
                         csv = "C, S, V".sub(/,s+/, ",")
                         cap = "one two".sub(/w+/) { |n| n.capitalize }
Use sub!()/gsub!() to
modify a String in
place
Basic Regex Usage

Strings has methods                 if "100" =~ /Ad+z/
supporting:                           puts "This is a number."
                                    end


                             "Find all, words.".scan(/w+/) do |word|
   Find/Find All               puts word.downcase
                             end

                             year, month, day = "2008-09-04".scan(/d+/)
   Replace/Replace All
                         csv = "C, S, V".sub(/,s+/, ",")
                         cap = "one two".sub(/w+/) { |n| n.capitalize }
Use sub!()/gsub!() to
modify a String in       csv = "C, S, V".gsub(/,s+/, ",")
                         caps = "one two".gsub(/w+/) { |n| n.capitalize }
place
Literal Characters
Literal Characters

Most characters in a regex match
themselves literally
Literal Characters

Most characters in a regex match
themselves literally
  The only special characters are:
  [].^$?*+{}|()
Literal Characters

Most characters in a regex match
themselves literally
  The only special characters are:
  [].^$?*+{}|()
  You can proceed a special character
  with  to make it literal
Literal Characters

Most characters in a regex match
themselves literally
  The only special characters are:
  [].^$?*+{}|()
  You can proceed a special character
  with  to make it literal
The regex /James Gray/ matches my name
Character Classes
Character Classes

Characters in [ … ] are choices for a
single character match
Character Classes

Characters in [ … ] are choices for a
single character match
A leading ^ negates the class, so [^ … ]
matches what is not listed
Character Classes

Characters in [ … ] are choices for a
single character match
A leading ^ negates the class, so [^ … ]
matches what is not listed
You can use ranges like a-z or 0-9
Character Classes

Characters in [ … ] are choices for a
single character match
A leading ^ negates the class, so [^ … ]
matches what is not listed
You can use ranges like a-z or 0-9
The expression /[bcr]at/ will match
“bat,” “cat,” or “rat”
Shortcut Character Classes
Shortcut Character Classes

  Shortcut   Actual Character Class
     .                 [^n]
     s            [ tnrfv]
     S            [^ tnrfv]
    w            [a-zA-Z0-9_]
    W           [^a-zA-Z0-9_]
     d               [0-9]
    D                [^0-9]
Shortcut Character Classes

  Shortcut   Actual Character Class
     .                 [^n]
     s            [ tnrfv]
     S            [^ tnrfv]
    w            [a-zA-Z0-9_]
    W           [^a-zA-Z0-9_]
     d               [0-9]
    D                [^0-9]
Shortcut Character Classes

  Shortcut   Actual Character Class
     .                 [^n]
     s            [ tnrfv]
     S            [^ tnrfv]
    w            [a-zA-Z0-9_]
    W           [^a-zA-Z0-9_]
     d               [0-9]
    D                [^0-9]
Shortcut Character Classes

  Shortcut   Actual Character Class
     .                 [^n]
     s            [ tnrfv]
     S            [^ tnrfv]
    w            [a-zA-Z0-9_]
    W           [^a-zA-Z0-9_]
     d               [0-9]
    D                [^0-9]
Shortcut Character Classes

  Shortcut   Actual Character Class
     .                 [^n]
     s            [ tnrfv]
     S            [^ tnrfv]
    w            [a-zA-Z0-9_]
    W           [^a-zA-Z0-9_]
     d               [0-9]
    D                [^0-9]
Shortcut Character Classes

  Shortcut   Actual Character Class
     .                 [^n]
     s            [ tnrfv]
     S            [^ tnrfv]
    w            [a-zA-Z0-9_]
    W           [^a-zA-Z0-9_]
     d               [0-9]
    D                [^0-9]
Shortcut Character Classes

  Shortcut   Actual Character Class
     .                 [^n]
     s            [ tnrfv]
     S            [^ tnrfv]
    w            [a-zA-Z0-9_]
    W           [^a-zA-Z0-9_]
     d               [0-9]
    D                [^0-9]
Shortcut Character Classes

  Shortcut   Actual Character Class
     .                 [^n]
     s            [ tnrfv]
     S            [^ tnrfv]
    w            [a-zA-Z0-9_]
    W           [^a-zA-Z0-9_]
     d               [0-9]
    D                [^0-9]
Anchors
Anchors

Anchors match
between characters
Anchors

Anchors match
between characters

They are used to assert
that the content you
want must appear in a
certain place
Anchors

Anchors match
between characters

They are used to assert
that the content you
want must appear in a
certain place

Thus /^Totals/ searches
for a line starting with
“Totals”
Anchors
                           Anchor         Matches
Anchors match
between characters           A       Start of the String
                                     End of the String or
                             Z
They are used to assert             before trailing newline
that the content you         z        End of the String
want must appear in a        ^          Start of a line
certain place
                             $           End of a line
Thus /^Totals/ searches             Between wW or Ww,
                             b
for a line starting with               and at A and z
“Totals”                     B     Between ww or WW
Anchors
                           Anchor         Matches
Anchors match
between characters           A       Start of the String
                                     End of the String or
                             Z
They are used to assert             before trailing newline
that the content you         z        End of the String
want must appear in a        ^          Start of a line
certain place
                             $           End of a line
Thus /^Totals/ searches             Between wW or Ww,
                             b
for a line starting with               and at A and z
“Totals”                     B     Between ww or WW
Anchors
                           Anchor         Matches
Anchors match
between characters           A       Start of the String
                                     End of the String or
                             Z
They are used to assert             before trailing newline
that the content you         z        End of the String
want must appear in a        ^          Start of a line
certain place
                             $           End of a line
Thus /^Totals/ searches             Between wW or Ww,
                             b
for a line starting with               and at A and z
“Totals”                     B     Between ww or WW
Anchors
                           Anchor         Matches
Anchors match
between characters           A       Start of the String
                                     End of the String or
                             Z
They are used to assert             before trailing newline
that the content you         z        End of the String
want must appear in a        ^          Start of a line
certain place
                             $           End of a line
Thus /^Totals/ searches             Between wW or Ww,
                             b
for a line starting with               and at A and z
“Totals”                     B     Between ww or WW
Anchors
                           Anchor         Matches
Anchors match
between characters           A       Start of the String
                                     End of the String or
                             Z
They are used to assert             before trailing newline
that the content you         z        End of the String
want must appear in a        ^          Start of a line
certain place
                             $           End of a line
Thus /^Totals/ searches             Between wW or Ww,
                             b
for a line starting with               and at A and z
“Totals”                     B     Between ww or WW
Anchors
                           Anchor         Matches
Anchors match
between characters           A       Start of the String
                                     End of the String or
                             Z
They are used to assert             before trailing newline
that the content you         z        End of the String
want must appear in a        ^          Start of a line
certain place
                             $           End of a line
Thus /^Totals/ searches             Between wW or Ww,
                             b
for a line starting with               and at A and z
“Totals”                     B     Between ww or WW
Anchors
                           Anchor         Matches
Anchors match
between characters           A       Start of the String
                                     End of the String or
                             Z
They are used to assert             before trailing newline
that the content you         z        End of the String
want must appear in a        ^          Start of a line
certain place
                             $           End of a line
Thus /^Totals/ searches             Between wW or Ww,
                             b
for a line starting with               and at A and z
“Totals”                     B     Between ww or WW
Anchors
                           Anchor         Matches
Anchors match
between characters           A       Start of the String
                                     End of the String or
                             Z
They are used to assert             before trailing newline
that the content you         z        End of the String
want must appear in a        ^          Start of a line
certain place
                             $           End of a line
Thus /^Totals/ searches             Between wW or Ww,
                             b
for a line starting with               and at A and z
“Totals”                     B     Between ww or WW
Repetition
Repetition

You can tack symbols
onto an element of a
regex to indicate that
element can repeat
Repetition

You can tack symbols
onto an element of a
regex to indicate that
element can repeat

The expression /ab+c?/
matches an a, followed
by one or more b’s, and
optionally followed by
ac
Repetition

You can tack symbols      Repeater Allowed Count
onto an element of a
                              ?      Zero or one
regex to indicate that
element can repeat            +      One or more
                              *     Zero or more
The expression /ab+c?/       {n}      Exactly n
matches an a, followed      {n,}      At least n
by one or more b’s, and
                            {,m}   No more than m
optionally followed by
ac                         {n,m}   Between n and m
Repetition

You can tack symbols      Repeater Allowed Count
onto an element of a
                              ?      Zero or one
regex to indicate that
element can repeat            +      One or more
                              *     Zero or more
The expression /ab+c?/       {n}      Exactly n
matches an a, followed      {n,}      At least n
by one or more b’s, and
                            {,m}   No more than m
optionally followed by
ac                         {n,m}   Between n and m
Repetition

You can tack symbols      Repeater Allowed Count
onto an element of a
                              ?      Zero or one
regex to indicate that
element can repeat            +      One or more
                              *     Zero or more
The expression /ab+c?/       {n}      Exactly n
matches an a, followed      {n,}      At least n
by one or more b’s, and
                            {,m}   No more than m
optionally followed by
ac                         {n,m}   Between n and m
Repetition

You can tack symbols      Repeater Allowed Count
onto an element of a
                              ?      Zero or one
regex to indicate that
element can repeat            +      One or more
                              *     Zero or more
The expression /ab+c?/       {n}      Exactly n
matches an a, followed      {n,}      At least n
by one or more b’s, and
                            {,m}   No more than m
optionally followed by
ac                         {n,m}   Between n and m
Repetition

You can tack symbols      Repeater Allowed Count
onto an element of a
                              ?      Zero or one
regex to indicate that
element can repeat            +      One or more
                              *     Zero or more
The expression /ab+c?/       {n}      Exactly n
matches an a, followed      {n,}      At least n
by one or more b’s, and
                            {,m}   No more than m
optionally followed by
ac                         {n,m}   Between n and m
Repetition

You can tack symbols      Repeater Allowed Count
onto an element of a
                              ?      Zero or one
regex to indicate that
element can repeat            +      One or more
                              *     Zero or more
The expression /ab+c?/       {n}      Exactly n
matches an a, followed      {n,}      At least n
by one or more b’s, and
                            {,m}   No more than m
optionally followed by
ac                         {n,m}   Between n and m
Repetition

You can tack symbols      Repeater Allowed Count
onto an element of a
                              ?      Zero or one
regex to indicate that
element can repeat            +      One or more
                              *     Zero or more
The expression /ab+c?/       {n}      Exactly n
matches an a, followed      {n,}      At least n
by one or more b’s, and
                            {,m}   No more than m
optionally followed by
ac                         {n,m}   Between n and m
Repetition

You can tack symbols      Repeater Allowed Count
onto an element of a
                              ?      Zero or one
regex to indicate that
element can repeat            +      One or more
                              *     Zero or more
The expression /ab+c?/       {n}      Exactly n
matches an a, followed      {n,}      At least n
by one or more b’s, and
                            {,m}   No more than m
optionally followed by
ac                         {n,m}   Between n and m
Some Examples
Some Examples
  if var =~ /As*z/
    puts "Variable is blank."
  end
Some Examples
  if var =~ /As*z/
    puts "Variable is blank."
  end


  if var !~ /S/
    puts "Variable is blank."
  end
Some Examples
                                 if var =~ /As*z/
                                   puts "Variable is blank."
                                 end


                                 if var !~ /S/
                                   puts "Variable is blank."
                                 end


                       From TopCoder.com, SRM 216 “CultureShock:”

Bob and Doug have recently moved from Canada to the United States, and they are confused
 by this strange letter, "ZEE". They need your assistance. Given a String text, replace every
         occurrence of the word, "ZEE", with the word, "ZED", and return the result.

Note that if "ZEE" is just part of a larger word (for example, "ZEES"), it should not be altered.
Some Examples
                                 if var =~ /As*z/
                                   puts "Variable is blank."
                                 end


                                 if var !~ /S/
                                   puts "Variable is blank."
                                 end


                       From TopCoder.com, SRM 216 “CultureShock:”

Bob and Doug have recently moved from Canada to the United States, and they are confused
 by this strange letter, "ZEE". They need your assistance. Given a String text, replace every
         occurrence of the word, "ZEE", with the word, "ZED", and return the result.

Note that if "ZEE" is just part of a larger word (for example, "ZEES"), it should not be altered.

                           solution = text.gsub(/bZEEb/, "ZED")
Greedy Verses Non-Greedy
Greedy Verses Non-Greedy
By default repetition
will always be greedy,
consuming as many
characters as possible
Greedy Verses Non-Greedy
By default repetition
will always be greedy,
consuming as many
characters as possible

   The match will
   backtrack, giving
   up characters, if it
   helps it succeed
Greedy Verses Non-Greedy
By default repetition
will always be greedy,
consuming as many
characters as possible

   The match will
   backtrack, giving
   up characters, if it
   helps it succeed

You can negate this,
matching minimal
characters
Greedy Verses Non-Greedy
By default repetition
will always be greedy,
consuming as many         Greedy   Non-Greedy
characters as possible       ?          ??
                             +          +?
   The match will
                             *          *?
   backtrack, giving
   up characters, if it     {n}        N/A
   helps it succeed        {n,}       {n,}?
                           {,m}       {,m}?
You can negate this,      {n,m}      {n,m}?
matching minimal
characters
Greedy Verses Non-Greedy
By default repetition
will always be greedy,
consuming as many         Greedy   Non-Greedy
characters as possible       ?          ??
                             +          +?
   The match will
                             *          *?
   backtrack, giving
   up characters, if it     {n}        N/A
   helps it succeed        {n,}       {n,}?
                           {,m}       {,m}?
You can negate this,      {n,m}      {n,m}?
matching minimal
characters
Greedy Verses Non-Greedy
By default repetition
will always be greedy,
consuming as many         Greedy   Non-Greedy
characters as possible       ?          ??
                             +          +?
   The match will
                             *          *?
   backtrack, giving
   up characters, if it     {n}        N/A
   helps it succeed        {n,}       {n,}?
                           {,m}       {,m}?
You can negate this,      {n,m}      {n,m}?
matching minimal
characters
Greedy Verses Non-Greedy
By default repetition
will always be greedy,
consuming as many         Greedy   Non-Greedy
characters as possible       ?          ??
                             +          +?
   The match will
                             *          *?
   backtrack, giving
   up characters, if it     {n}        N/A
   helps it succeed        {n,}       {n,}?
                           {,m}       {,m}?
You can negate this,      {n,m}      {n,m}?
matching minimal
characters
Greedy Verses Non-Greedy
By default repetition
will always be greedy,
consuming as many         Greedy   Non-Greedy
characters as possible       ?          ??
                             +          +?
   The match will
                             *          *?
   backtrack, giving
   up characters, if it     {n}        N/A
   helps it succeed        {n,}       {n,}?
                           {,m}       {,m}?
You can negate this,      {n,m}      {n,m}?
matching minimal
characters
Greedy Verses Non-Greedy
By default repetition
will always be greedy,
consuming as many         Greedy   Non-Greedy
characters as possible       ?          ??
                             +          +?
   The match will
                             *          *?
   backtrack, giving
   up characters, if it     {n}        N/A
   helps it succeed        {n,}       {n,}?
                           {,m}       {,m}?
You can negate this,      {n,m}      {n,m}?
matching minimal
characters
Greedy Verses Non-Greedy
By default repetition
will always be greedy,
consuming as many         Greedy   Non-Greedy
characters as possible       ?          ??
                             +          +?
   The match will
                             *          *?
   backtrack, giving
   up characters, if it     {n}        N/A
   helps it succeed        {n,}       {n,}?
                           {,m}       {,m}?
You can negate this,      {n,m}      {n,m}?
matching minimal
characters
Greedy Verses Non-Greedy
By default repetition
will always be greedy,
consuming as many         Greedy   Non-Greedy
characters as possible       ?          ??
                             +          +?
   The match will
                             *          *?
   backtrack, giving
   up characters, if it     {n}        N/A
   helps it succeed        {n,}       {n,}?
                           {,m}       {,m}?
You can negate this,      {n,m}      {n,m}?
matching minimal
characters
Alternation
Alternation

In a regex, | means “or”
Alternation

In a regex, | means “or”
You can put a full expression on the left
and another full expression on the right
Alternation

In a regex, | means “or”
You can put a full expression on the left
and another full expression on the right
Either can match
Alternation

In a regex, | means “or”
You can put a full expression on the left
and another full expression on the right
Either can match
The expression /James|words?/ will
match “James,” “word,” or “words”
Grouping
Grouping

Everything in ( … ) is grouped into a
single element for the purposes of
repetition and alternation
Grouping

Everything in ( … ) is grouped into a
single element for the purposes of
repetition and alternation
The expression /(ha)+/ matches “ha,”
“haha,” “hahaha,” etc.
Grouping

Everything in ( … ) is grouped into a
single element for the purposes of
repetition and alternation
The expression /(ha)+/ matches “ha,”
“haha,” “hahaha,” etc.
The expression /Greg(ory)?/ matches
“Greg” and “Gregory”
Captures
Captures

( … ) also capture
what they match
Captures

( … ) also capture
what they match

After a match, you can
access these captures
in the variables $1, $2,
etc., from left to right
Captures

( … ) also capture
what they match

After a match, you can
access these captures
in the variables $1, $2,
etc., from left to right

Use 1, 2, etc. in
String replacements
Captures

( … ) also capture
what they match
                           "$99.95" =~ /$(d+(.d+)?)/
After a match, you can
access these captures
in the variables $1, $2,
etc., from left to right

Use 1, 2, etc. in
String replacements
Captures

( … ) also capture
what they match
                           "$99.95" =~ /$(d+(.d+)?)/
After a match, you can
access these captures
in the variables $1, $2,
                                          $1
etc., from left to right

Use 1, 2, etc. in
String replacements
Captures

( … ) also capture
what they match
                           "$99.95" =~ /$(d+(.d+)?)/
After a match, you can
access these captures
in the variables $1, $2,
                                          $1
etc., from left to right                        $2
Use 1, 2, etc. in
String replacements
Modes
Modes

Regular expressions have modes
Modes

Regular expressions have modes
  End an expression with /i to make the
  expression case insensitive
Modes

Regular expressions have modes
  End an expression with /i to make the
  expression case insensitive
  End with /m for “multi-line” mode
  where . will also match newlines
Modes

Regular expressions have modes
  End an expression with /i to make the
  expression case insensitive
  End with /m for “multi-line” mode
  where . will also match newlines
  Use /x to add space and comments
Modes

Regular expressions have modes
  End an expression with /i to make the
  expression case insensitive
  End with /m for “multi-line” mode
  where . will also match newlines
  Use /x to add space and comments
  You can combine modes: /mi
More Examples
More Examples


if ip =~ /Ad{1,3}(.d{1,3}){3}z/
  puts "IP adress is well formed."
end
More Examples


if ip =~ /Ad{1,3}(.d{1,3}){3}z/
  puts "IP adress is well formed."
end



  if text =~ /b(at|for|in)[.?!]/
    puts "You have bad grammar."
  end
More Examples


         if ip =~ /Ad{1,3}(.d{1,3}){3}z/
           puts "IP adress is well formed."
         end



            if text =~ /b(at|for|in)[.?!]/
              puts "You have bad grammar."
            end



james_gray = "Gray, James".sub(/(S+),s*(.+)/, '2 1')
Other Tricks
Other Tricks
There are other special
variables for regexen
including $`, $&, and $’
Other Tricks
There are other special
variables for regexen
including $`, $&, and $’

You can escape content
for use in a regex
Other Tricks
There are other special
variables for regexen
including $`, $&, and $’

You can escape content
for use in a regex

There’s a MatchData
object for matches
Other Tricks
There are other special
variables for regexen
including $`, $&, and $’

You can escape content
for use in a regex

There’s a MatchData
object for matches

Many methods can take
a regex
Other Tricks
                           "one_two_three" =~ /two/
There are other special    one_, two, _three = $`, $&, $'

variables for regexen
including $`, $&, and $’

You can escape content
for use in a regex

There’s a MatchData
object for matches

Many methods can take
a regex
Other Tricks
                                   "one_two_three" =~ /two/
There are other special            one_, two, _three = $`, $&, $'

variables for regexen
                           print "What's your favorite language? "
including $`, $&, and $’   lang = $stdin.gets.strip
                           if "Perl Java" =~ /b#{Regexp.escape(lang)}b/i
                             puts "You are weird."
                           else
You can escape content       puts "OK."
                           end
for use in a regex

There’s a MatchData
object for matches

Many methods can take
a regex
Other Tricks
                                   "one_two_three" =~ /two/
There are other special            one_, two, _three = $`, $&, $'

variables for regexen
                           print "What's your favorite language? "
including $`, $&, and $’   lang = $stdin.gets.strip
                           if "Perl Java" =~ /b#{Regexp.escape(lang)}b/i
                             puts "You are weird."
                           else
You can escape content       puts "OK."
                           end
for use in a regex
                           CONFIG_RE = /A([^=s]+)s*=s*(S+)/
                           config     = "url = http://ruby-lang.org"
There’s a MatchData        key, value = config.match(CONFIG_RE).captures

object for matches

Many methods can take
a regex
Other Tricks
                                   "one_two_three" =~ /two/
There are other special            one_, two, _three = $`, $&, $'

variables for regexen
                           print "What's your favorite language? "
including $`, $&, and $’   lang = $stdin.gets.strip
                           if "Perl Java" =~ /b#{Regexp.escape(lang)}b/i
                             puts "You are weird."
                           else
You can escape content       puts "OK."
                           end
for use in a regex
                           CONFIG_RE = /A([^=s]+)s*=s*(S+)/
                           config     = "url = http://ruby-lang.org"
There’s a MatchData        key, value = config.match(CONFIG_RE).captures

object for matches
                           fields = "1|2 | 3".split(/s*|s*/)

                           last_word_i = "one two three".rindex(/bw+/)
Many methods can take
                           five = "Count: 5"[/d+/]
a regex                    five = "Count: 5"[/Count:s*(d+)/, 1]
Quiz
Which of These Does not Match?
Which of These Does not Match?


                /wreck(ed)?b/i




    “wreck”                   “Wreck”
   “wrecked”               “wrecks”
  “shipwreck”           “That’s a wreck.”
Which of These Does not Match?


                /wreck(ed)?b/i




    “wreck”                   “Wreck”
   “wrecked”               “wrecks”
  “shipwreck”           “That’s a wreck.”
What Does This Match?
What Does This Match?



   /A([1-9]|[1-9]d|[12]dd)z/
What Does This Match?



    /A([1-9]|[1-9]d|[12]dd)z/




A number between 1 and 299.
What’s in $1?
What’s in $1?



path = "/blogs/3/comments/7/rating.json"
path =~ /(w+)(.w+)?Z/
What’s in $1?



path = "/blogs/3/comments/7/rating.json"
path =~ /(w+)(.w+)?Z/




             “rating”
Advanced Features
Regular Expression Extensions
Regular Expression Extensions

 Ruby’s regex engine
 adds several common
 extensions
Regular Expression Extensions

 Ruby’s regex engine
 adds several common
 extensions

    These usually look
    something like
    (? … )
Regular Expression Extensions

 Ruby’s regex engine
 adds several common
 extensions

    These usually look
    something like
    (? … )

 The simplest is (?: … )
 which is grouping
 without capturing
Regular Expression Extensions

 Ruby’s regex engine
 adds several common
                           data = "put the ball   in the sack"
 extensions                re     = %r{
                             (put|set)            #   verb: $1
                              s+                 #   some space (/x safe)
                             (?:(?:the|a)s+)?    #   an article (optional)
    These usually look       (w+)                #   noun: $2
                              s+
    something like           (?:in(?:side)?)?     # preposition (optional)
                              s+
    (? … )                   (?:(?:the|a)s+)?
                             (w+)                # noun: $3
                           }x
                           p data =~ re
 The simplest is (?: … )   p [$1, $2, $3]

 which is grouping
 without capturing
Look-Around Assertions
Look-Around Assertions
You can use look-ahead
assertions to peek
ahead without
consuming characters:
Look-Around Assertions
You can use look-ahead
assertions to peek
ahead without
consuming characters:

   (?= … ) and (?! … )
Look-Around Assertions
You can use look-ahead
assertions to peek
ahead without
consuming characters:

   (?= … ) and (?! … )

Ruby 1.9 adds a fixed
look-behind:
Look-Around Assertions
You can use look-ahead
assertions to peek
ahead without
consuming characters:

   (?= … ) and (?! … )

Ruby 1.9 adds a fixed
look-behind:

   (?<= … ) and
   (?<! … )
Look-Around Assertions
You can use look-ahead
assertions to peek
ahead without
consuming characters:
                         class Numeric
                             def commify
   (?= … ) and (?! … )         to_s.reverse.
                                    gsub(/(ddd)(?=d)(?!d*.)/, '1,').
                                    reverse
                             end
Ruby 1.9 adds a fixed     end

look-behind:

   (?<= … ) and
   (?<! … )
Oniguruma
Oniguruma

Ruby 1.9’s regex engine
is faster and more
powerful:
Oniguruma

Ruby 1.9’s regex engine
is faster and more
powerful:

   Named groups
Oniguruma

Ruby 1.9’s regex engine
is faster and more
powerful:

   Named groups

   Nested matching
Oniguruma

Ruby 1.9’s regex engine
is faster and more
powerful:

   Named groups

   Nested matching

   Improved encodings
Oniguruma

Ruby 1.9’s regex engine
is faster and more
powerful:

   Named groups

   Nested matching

   Improved encodings

   And more…
Oniguruma

Ruby 1.9’s regex engine
is faster and more        config = "mode = wrap"
                          if /A(?<key>w+)s*=s*(?<value>w+)/ =~ config
powerful:                   puts "Key is #{key} and value is #{value}"
                          end


   Named groups

   Nested matching

   Improved encodings

   And more…
Oniguruma

Ruby 1.9’s regex engine
is faster and more        config = "mode = wrap"
                          if /A(?<key>w+)s*=s*(?<value>w+)/ =~ config
powerful:                   puts "Key is #{key} and value is #{value}"
                          end


   Named groups
                          CHECK = /A(?<paren>((g<paren>|[^()])*?))z/
                          %w[ ()
                              (()())
   Nested matching            (a(b(c,d())))
                              ()) ].each do |test|
                            unless test =~ CHECK
                              puts "#{test} isn't balanced"
   Improved encodings       end
                          end


   And more…
A Case Study
The Data
The Data



data = <<END_FIELDS.gsub(/s+/, " ")
Business Name (Text Field),
Allows Pets (Check),
Open To (Dropdown: Men, Women, Children, Any),
Atmosphere (Check List: Calm, Romantic, New Age)
END_FIELDS
A First Pass
A First Pass


data = <<END_FIELDS.gsub(/s+/, " ")
Business Name (Text Field),
Allows Pets (Check),
Open To (Dropdown: Men, Women, Children, Any),
Atmosphere (Check List: Calm, Romantic, New Age)
END_FIELDS
data.scan(/([^,s(][^,(]*?)s*(([^)]+))/) do |name, kind|
  p [name, kind]
end

__END__
["Business Name", "Text Field"]
["Allows Pets", "Check"]
["Open To", "Dropdown: Men, Women, Children, Any"]
["Atmosphere", "Check List: Calm, Romantic, New Age"]
A First Pass


data = <<END_FIELDS.gsub(/s+/, " ")
Business Name (Text Field),
Allows Pets (Check),
Open To (Dropdown: Men, Women, Children, Any),
Atmosphere (Check List: Calm, Romantic, New Age)
END_FIELDS
data.scan(/([^,s(][^,(]*?)s*(([^)]+))/) do |name, kind|
  p [name, kind]
end

__END__
["Business Name", "Text Field"]
["Allows Pets", "Check"]
["Open To", "Dropdown: Men, Women, Children, Any"]
["Atmosphere", "Check List: Calm, Romantic, New Age"]
A First Pass


data = <<END_FIELDS.gsub(/s+/, " ")
Business Name (Text Field),
Allows Pets (Check),
Open To (Dropdown: Men, Women, Children, Any),
Atmosphere (Check List: Calm, Romantic, New Age)
END_FIELDS
data.scan(/([^,s(][^,(]*?)s*(([^)]+))/) do |name, kind|
  p [name, kind]
end

__END__
["Business Name", "Text Field"]
["Allows Pets", "Check"]
["Open To", "Dropdown: Men, Women, Children, Any"]
["Atmosphere", "Check List: Calm, Romantic, New Age"]
Parsed
Parsed


data = <<END_FIELDS.gsub(/s+/, " ")
Business Name (Text Field),
Allows Pets (Check),
Open To (Dropdown: Men, Women, Children, Any),
Atmosphere (Check List: Calm, Romantic, New Age)
END_FIELDS
data.scan(/([^,s(][^,(]*?)s*(([^):]+)(?::s*([^)]+?)s*)?)/) do |name, kind, fields|
  p [name, kind, fields]
end

__END__
["Business Name", "Text Field", nil]
["Allows Pets", "Check", nil]
["Open To", "Dropdown", "Men, Women, Children, Any"]
["Atmosphere", "Check List", "Calm, Romantic, New Age"]
Parsed


data = <<END_FIELDS.gsub(/s+/, " ")
Business Name (Text Field),
Allows Pets (Check),
Open To (Dropdown: Men, Women, Children, Any),
Atmosphere (Check List: Calm, Romantic, New Age)
END_FIELDS
data.scan(/([^,s(][^,(]*?)s*(([^):]+)(?::s*([^)]+?)s*)?)/) do |name, kind, fields|
  p [name, kind, fields]
end

__END__
["Business Name", "Text Field", nil]
["Allows Pets", "Check", nil]
["Open To", "Dropdown", "Men, Women, Children, Any"]
["Atmosphere", "Check List", "Calm, Romantic, New Age"]
Parsed


data = <<END_FIELDS.gsub(/s+/, " ")
Business Name (Text Field),
Allows Pets (Check),
Open To (Dropdown: Men, Women, Children, Any),
Atmosphere (Check List: Calm, Romantic, New Age)
END_FIELDS
data.scan(/([^,s(][^,(]*?)s*(([^):]+)(?::s*([^)]+?)s*)?)/) do |name, kind, fields|
  p [name, kind, fields]
end

__END__
["Business Name", "Text Field", nil]
["Allows Pets", "Check", nil]
["Open To", "Dropdown", "Men, Women, Children, Any"]
["Atmosphere", "Check List", "Calm, Romantic, New Age"]

More Related Content

What's hot

Andrei's Regex Clinic
Andrei's Regex ClinicAndrei's Regex Clinic
Andrei's Regex Clinic
Andrei Zmievski
 
Regular Expressions 101
Regular Expressions 101Regular Expressions 101
Regular Expressions 101Raj Rajandran
 
Introduction to Regular Expressions
Introduction to Regular ExpressionsIntroduction to Regular Expressions
Introduction to Regular Expressions
Matt Casto
 
Regular expression
Regular expressionRegular expression
Regular expression
Larry Nung
 
Textpad and Regular Expressions
Textpad and Regular ExpressionsTextpad and Regular Expressions
Textpad and Regular Expressions
OCSI
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
Mahzad Zahedi
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsRaj Gupta
 
Regular expression in javascript
Regular expression in javascriptRegular expression in javascript
Regular expression in javascript
Toan Nguyen
 
The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++Anjesh Tuladhar
 
16 Java Regex
16 Java Regex16 Java Regex
16 Java Regexwayn
 

What's hot (12)

Andrei's Regex Clinic
Andrei's Regex ClinicAndrei's Regex Clinic
Andrei's Regex Clinic
 
Regular Expressions 101
Regular Expressions 101Regular Expressions 101
Regular Expressions 101
 
Introduction to Regular Expressions
Introduction to Regular ExpressionsIntroduction to Regular Expressions
Introduction to Regular Expressions
 
Regular expression
Regular expressionRegular expression
Regular expression
 
Textpad and Regular Expressions
Textpad and Regular ExpressionsTextpad and Regular Expressions
Textpad and Regular Expressions
 
Regex cheatsheet
Regex cheatsheetRegex cheatsheet
Regex cheatsheet
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regular expression in javascript
Regular expression in javascriptRegular expression in javascript
Regular expression in javascript
 
The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++
 
16 Java Regex
16 Java Regex16 Java Regex
16 Java Regex
 

Viewers also liked

Grokking regex
Grokking regexGrokking regex
Grokking regex
David Stockton
 
Iterators, ArrayAccess & Countable (Oh My!) - Madison PHP 2014
Iterators, ArrayAccess & Countable (Oh My!) - Madison PHP 2014Iterators, ArrayAccess & Countable (Oh My!) - Madison PHP 2014
Iterators, ArrayAccess & Countable (Oh My!) - Madison PHP 2014
Sandy Smith
 
Lessons from a Dying CMS
Lessons from a Dying CMSLessons from a Dying CMS
Lessons from a Dying CMS
Sandy Smith
 
Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"
Daniel_Rhodes
 
Architecting with Queues for Scale, Speed, and Separation (DCPHP 3/11/15)
Architecting with Queues for Scale, Speed, and Separation (DCPHP 3/11/15)Architecting with Queues for Scale, Speed, and Separation (DCPHP 3/11/15)
Architecting with Queues for Scale, Speed, and Separation (DCPHP 3/11/15)
Sandy Smith
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
davidfstr
 
Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014
Sandy Smith
 
Unicode Regular Expressions
Unicode Regular ExpressionsUnicode Regular Expressions
Unicode Regular Expressions
Nova Patch
 
TDA Center Depok update 2014 (Concept)
TDA Center Depok update 2014 (Concept)TDA Center Depok update 2014 (Concept)
TDA Center Depok update 2014 (Concept)
Herri Setiawan
 
Don't Fear the Regex LSP15
Don't Fear the Regex LSP15Don't Fear the Regex LSP15
Don't Fear the Regex LSP15
Sandy Smith
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
Nicole Ryan
 
Multibyte string handling in PHP
Multibyte string handling in PHPMultibyte string handling in PHP
Multibyte string handling in PHP
Daniel_Rhodes
 
GAIQ - Regular expressions-google-analytics
GAIQ - Regular expressions-google-analyticsGAIQ - Regular expressions-google-analytics
GAIQ - Regular expressions-google-analytics
Ankita Kishore
 
Regular Expressions: Backtracking, and The Little Engine that Could(n't)?
Regular Expressions: Backtracking, and The Little Engine that Could(n't)?Regular Expressions: Backtracking, and The Little Engine that Could(n't)?
Regular Expressions: Backtracking, and The Little Engine that Could(n't)?
daoswald
 
Architecting with Queues - Northeast PHP 2015
Architecting with Queues - Northeast PHP 2015Architecting with Queues - Northeast PHP 2015
Architecting with Queues - Northeast PHP 2015
Sandy Smith
 
Working with Databases and MySQL
Working with Databases and MySQLWorking with Databases and MySQL
Working with Databases and MySQL
Nicole Ryan
 
How to report a bug
How to report a bugHow to report a bug
How to report a bugSandy Smith
 
EDUPUB 2013: Schema.org LRMI and A11Y for Discovery
EDUPUB 2013: Schema.org LRMI and A11Y for DiscoveryEDUPUB 2013: Schema.org LRMI and A11Y for Discovery
EDUPUB 2013: Schema.org LRMI and A11Y for DiscoveryGerardo Capiel
 

Viewers also liked (20)

Dom
DomDom
Dom
 
Grokking regex
Grokking regexGrokking regex
Grokking regex
 
Iterators, ArrayAccess & Countable (Oh My!) - Madison PHP 2014
Iterators, ArrayAccess & Countable (Oh My!) - Madison PHP 2014Iterators, ArrayAccess & Countable (Oh My!) - Madison PHP 2014
Iterators, ArrayAccess & Countable (Oh My!) - Madison PHP 2014
 
Lessons from a Dying CMS
Lessons from a Dying CMSLessons from a Dying CMS
Lessons from a Dying CMS
 
Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"
 
Architecting with Queues for Scale, Speed, and Separation (DCPHP 3/11/15)
Architecting with Queues for Scale, Speed, and Separation (DCPHP 3/11/15)Architecting with Queues for Scale, Speed, and Separation (DCPHP 3/11/15)
Architecting with Queues for Scale, Speed, and Separation (DCPHP 3/11/15)
 
Intoduction to php strings
Intoduction to php  stringsIntoduction to php  strings
Intoduction to php strings
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014
 
Unicode Regular Expressions
Unicode Regular ExpressionsUnicode Regular Expressions
Unicode Regular Expressions
 
TDA Center Depok update 2014 (Concept)
TDA Center Depok update 2014 (Concept)TDA Center Depok update 2014 (Concept)
TDA Center Depok update 2014 (Concept)
 
Don't Fear the Regex LSP15
Don't Fear the Regex LSP15Don't Fear the Regex LSP15
Don't Fear the Regex LSP15
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Multibyte string handling in PHP
Multibyte string handling in PHPMultibyte string handling in PHP
Multibyte string handling in PHP
 
GAIQ - Regular expressions-google-analytics
GAIQ - Regular expressions-google-analyticsGAIQ - Regular expressions-google-analytics
GAIQ - Regular expressions-google-analytics
 
Regular Expressions: Backtracking, and The Little Engine that Could(n't)?
Regular Expressions: Backtracking, and The Little Engine that Could(n't)?Regular Expressions: Backtracking, and The Little Engine that Could(n't)?
Regular Expressions: Backtracking, and The Little Engine that Could(n't)?
 
Architecting with Queues - Northeast PHP 2015
Architecting with Queues - Northeast PHP 2015Architecting with Queues - Northeast PHP 2015
Architecting with Queues - Northeast PHP 2015
 
Working with Databases and MySQL
Working with Databases and MySQLWorking with Databases and MySQL
Working with Databases and MySQL
 
How to report a bug
How to report a bugHow to report a bug
How to report a bug
 
EDUPUB 2013: Schema.org LRMI and A11Y for Discovery
EDUPUB 2013: Schema.org LRMI and A11Y for DiscoveryEDUPUB 2013: Schema.org LRMI and A11Y for Discovery
EDUPUB 2013: Schema.org LRMI and A11Y for Discovery
 

Similar to Regular expressions

Regexp secrets
Regexp secretsRegexp secrets
Regexp secrets
Hiro Asari
 
Lecture 23
Lecture 23Lecture 23
Lecture 23rhshriva
 
Eloquent Ruby chapter 4 - Find The Right String with Regular Expression
Eloquent Ruby chapter 4 - Find The Right String with Regular ExpressionEloquent Ruby chapter 4 - Find The Right String with Regular Expression
Eloquent Ruby chapter 4 - Find The Right String with Regular ExpressionKuyseng Chhoeun
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP Strings
Ahmed Swilam
 
Regular Expressions and You
Regular Expressions and YouRegular Expressions and You
Regular Expressions and YouJames Armes
 
Working with text, Regular expressions
Working with text, Regular expressionsWorking with text, Regular expressions
Working with text, Regular expressions
Krasimir Berov (Красимир Беров)
 
Perl Intro 5 Regex Matches And Substitutions
Perl Intro 5 Regex Matches And SubstitutionsPerl Intro 5 Regex Matches And Substitutions
Perl Intro 5 Regex Matches And SubstitutionsShaun Griffith
 
PERL Regular Expression
PERL Regular ExpressionPERL Regular Expression
PERL Regular ExpressionBinsent Ribera
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...
Codemotion
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introduction
Prof. Wim Van Criekinge
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
keeyre
 
Using Regular Expressions and Staying Sane
Using Regular Expressions and Staying SaneUsing Regular Expressions and Staying Sane
Using Regular Expressions and Staying Sane
Carl Brown
 

Similar to Regular expressions (20)

Regexp secrets
Regexp secretsRegexp secrets
Regexp secrets
 
Lecture 23
Lecture 23Lecture 23
Lecture 23
 
Eloquent Ruby chapter 4 - Find The Right String with Regular Expression
Eloquent Ruby chapter 4 - Find The Right String with Regular ExpressionEloquent Ruby chapter 4 - Find The Right String with Regular Expression
Eloquent Ruby chapter 4 - Find The Right String with Regular Expression
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP Strings
 
Regular Expressions and You
Regular Expressions and YouRegular Expressions and You
Regular Expressions and You
 
Working with text, Regular expressions
Working with text, Regular expressionsWorking with text, Regular expressions
Working with text, Regular expressions
 
Perl Intro 5 Regex Matches And Substitutions
Perl Intro 5 Regex Matches And SubstitutionsPerl Intro 5 Regex Matches And Substitutions
Perl Intro 5 Regex Matches And Substitutions
 
PERL Regular Expression
PERL Regular ExpressionPERL Regular Expression
PERL Regular Expression
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...
 
Reg EX
Reg EXReg EX
Reg EX
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introduction
 
ruby3_6up
ruby3_6upruby3_6up
ruby3_6up
 
ruby3_6up
ruby3_6upruby3_6up
ruby3_6up
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Using Regular Expressions and Staying Sane
Using Regular Expressions and Staying SaneUsing Regular Expressions and Staying Sane
Using Regular Expressions and Staying Sane
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 

More from James Gray

A Dickens of A Keynote
A Dickens of A KeynoteA Dickens of A Keynote
A Dickens of A KeynoteJames Gray
 
I Doubt That!
I Doubt That!I Doubt That!
I Doubt That!
James Gray
 
Counting on God
Counting on GodCounting on God
Counting on God
James Gray
 
In the Back of Your Mind
In the Back of Your MindIn the Back of Your Mind
In the Back of Your Mind
James Gray
 
Unblocked
UnblockedUnblocked
Unblocked
James Gray
 
Module Magic
Module MagicModule Magic
Module Magic
James Gray
 
API Design
API DesignAPI Design
API Design
James Gray
 
Amazon's Simple Storage Service (S3)
Amazon's Simple Storage Service (S3)Amazon's Simple Storage Service (S3)
Amazon's Simple Storage Service (S3)
James Gray
 
Git and GitHub
Git and GitHubGit and GitHub
Git and GitHub
James Gray
 
Test Coverage in Rails
Test Coverage in RailsTest Coverage in Rails
Test Coverage in Rails
James Gray
 
Rails Routing And Rendering
Rails Routing And RenderingRails Routing And Rendering
Rails Routing And Rendering
James Gray
 
Sending Email with Rails
Sending Email with RailsSending Email with Rails
Sending Email with Rails
James Gray
 
Associations in Rails
Associations in RailsAssociations in Rails
Associations in Rails
James Gray
 
DRYing Up Rails Views and Controllers
DRYing Up Rails Views and ControllersDRYing Up Rails Views and Controllers
DRYing Up Rails Views and Controllers
James Gray
 
Building a Rails Interface
Building a Rails InterfaceBuilding a Rails Interface
Building a Rails Interface
James Gray
 
Rails Model Basics
Rails Model BasicsRails Model Basics
Rails Model Basics
James Gray
 
Ruby
RubyRuby
Wed Development on Rails
Wed Development on RailsWed Development on Rails
Wed Development on Rails
James Gray
 

More from James Gray (18)

A Dickens of A Keynote
A Dickens of A KeynoteA Dickens of A Keynote
A Dickens of A Keynote
 
I Doubt That!
I Doubt That!I Doubt That!
I Doubt That!
 
Counting on God
Counting on GodCounting on God
Counting on God
 
In the Back of Your Mind
In the Back of Your MindIn the Back of Your Mind
In the Back of Your Mind
 
Unblocked
UnblockedUnblocked
Unblocked
 
Module Magic
Module MagicModule Magic
Module Magic
 
API Design
API DesignAPI Design
API Design
 
Amazon's Simple Storage Service (S3)
Amazon's Simple Storage Service (S3)Amazon's Simple Storage Service (S3)
Amazon's Simple Storage Service (S3)
 
Git and GitHub
Git and GitHubGit and GitHub
Git and GitHub
 
Test Coverage in Rails
Test Coverage in RailsTest Coverage in Rails
Test Coverage in Rails
 
Rails Routing And Rendering
Rails Routing And RenderingRails Routing And Rendering
Rails Routing And Rendering
 
Sending Email with Rails
Sending Email with RailsSending Email with Rails
Sending Email with Rails
 
Associations in Rails
Associations in RailsAssociations in Rails
Associations in Rails
 
DRYing Up Rails Views and Controllers
DRYing Up Rails Views and ControllersDRYing Up Rails Views and Controllers
DRYing Up Rails Views and Controllers
 
Building a Rails Interface
Building a Rails InterfaceBuilding a Rails Interface
Building a Rails Interface
 
Rails Model Basics
Rails Model BasicsRails Model Basics
Rails Model Basics
 
Ruby
RubyRuby
Ruby
 
Wed Development on Rails
Wed Development on RailsWed Development on Rails
Wed Development on Rails
 

Recently uploaded

GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 

Recently uploaded (20)

GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 

Regular expressions

  • 1. Regular Expressions The Black Magic of Programming
  • 4. The Fear Factor! For unknown reasons regular expressions are deeply shrouded in mystery
  • 5. The Fear Factor! For unknown reasons regular expressions are deeply shrouded in mystery Many programmers outright fear them
  • 6. The Fear Factor! For unknown reasons regular expressions are deeply shrouded in mystery Many programmers outright fear them I stumped a room full of programmers in Tulsa by shouting out a two character expression
  • 7. The Fear Factor! For unknown reasons regular expressions are deeply shrouded in mystery Many programmers outright fear them I stumped a room full of programmers in Tulsa by shouting out a two character expression I have know idea why this is
  • 8. What is a Regex?
  • 9. What is a Regex? Regular expression is a very small language for describing text
  • 10. What is a Regex? Regular expression is a very small language for describing text You can use them to dissect and change textual data
  • 11. What is a Regex? Regular expression is a very small language for describing text You can use them to dissect and change textual data I think of them as a DSL for find and replace operations
  • 12. Why Learn Regular Expressions?
  • 13. Why Learn Regular Expressions? Ruby leans heavily on regular expressions:
  • 14. Why Learn Regular Expressions? Ruby leans heavily on regular expressions: Many text operations in Ruby are easiest with the right regex
  • 15. Why Learn Regular Expressions? Ruby leans heavily on regular expressions: Many text operations in Ruby are easiest with the right regex Regular expressions are fast
  • 16. Why Learn Regular Expressions? Ruby leans heavily on regular expressions: Many text operations in Ruby are easiest with the right regex Regular expressions are fast Regular expressions are encoding aware
  • 17. Why Learn Regular Expressions? Ruby leans heavily on regular expressions: Many text operations in Ruby are easiest with the right regex Regular expressions are fast Regular expressions are encoding aware You can be the one scaring all the other programmers
  • 19. Basic Regex Usage Strings has methods supporting:
  • 20. Basic Regex Usage Strings has methods supporting: Find/Find All
  • 21. Basic Regex Usage Strings has methods supporting: Find/Find All Replace/Replace All
  • 22. Basic Regex Usage Strings has methods supporting: Find/Find All Replace/Replace All Use sub!()/gsub!() to modify a String in place
  • 23. Basic Regex Usage Strings has methods if "100" =~ /Ad+z/ supporting: puts "This is a number." end Find/Find All Replace/Replace All Use sub!()/gsub!() to modify a String in place
  • 24. Basic Regex Usage Strings has methods if "100" =~ /Ad+z/ supporting: puts "This is a number." end "Find all, words.".scan(/w+/) do |word| Find/Find All puts word.downcase end year, month, day = "2008-09-04".scan(/d+/) Replace/Replace All Use sub!()/gsub!() to modify a String in place
  • 25. Basic Regex Usage Strings has methods if "100" =~ /Ad+z/ supporting: puts "This is a number." end "Find all, words.".scan(/w+/) do |word| Find/Find All puts word.downcase end year, month, day = "2008-09-04".scan(/d+/) Replace/Replace All csv = "C, S, V".sub(/,s+/, ",") cap = "one two".sub(/w+/) { |n| n.capitalize } Use sub!()/gsub!() to modify a String in place
  • 26. Basic Regex Usage Strings has methods if "100" =~ /Ad+z/ supporting: puts "This is a number." end "Find all, words.".scan(/w+/) do |word| Find/Find All puts word.downcase end year, month, day = "2008-09-04".scan(/d+/) Replace/Replace All csv = "C, S, V".sub(/,s+/, ",") cap = "one two".sub(/w+/) { |n| n.capitalize } Use sub!()/gsub!() to modify a String in csv = "C, S, V".gsub(/,s+/, ",") caps = "one two".gsub(/w+/) { |n| n.capitalize } place
  • 28. Literal Characters Most characters in a regex match themselves literally
  • 29. Literal Characters Most characters in a regex match themselves literally The only special characters are: [].^$?*+{}|()
  • 30. Literal Characters Most characters in a regex match themselves literally The only special characters are: [].^$?*+{}|() You can proceed a special character with to make it literal
  • 31. Literal Characters Most characters in a regex match themselves literally The only special characters are: [].^$?*+{}|() You can proceed a special character with to make it literal The regex /James Gray/ matches my name
  • 33. Character Classes Characters in [ … ] are choices for a single character match
  • 34. Character Classes Characters in [ … ] are choices for a single character match A leading ^ negates the class, so [^ … ] matches what is not listed
  • 35. Character Classes Characters in [ … ] are choices for a single character match A leading ^ negates the class, so [^ … ] matches what is not listed You can use ranges like a-z or 0-9
  • 36. Character Classes Characters in [ … ] are choices for a single character match A leading ^ negates the class, so [^ … ] matches what is not listed You can use ranges like a-z or 0-9 The expression /[bcr]at/ will match “bat,” “cat,” or “rat”
  • 38. Shortcut Character Classes Shortcut Actual Character Class . [^n] s [ tnrfv] S [^ tnrfv] w [a-zA-Z0-9_] W [^a-zA-Z0-9_] d [0-9] D [^0-9]
  • 39. Shortcut Character Classes Shortcut Actual Character Class . [^n] s [ tnrfv] S [^ tnrfv] w [a-zA-Z0-9_] W [^a-zA-Z0-9_] d [0-9] D [^0-9]
  • 40. Shortcut Character Classes Shortcut Actual Character Class . [^n] s [ tnrfv] S [^ tnrfv] w [a-zA-Z0-9_] W [^a-zA-Z0-9_] d [0-9] D [^0-9]
  • 41. Shortcut Character Classes Shortcut Actual Character Class . [^n] s [ tnrfv] S [^ tnrfv] w [a-zA-Z0-9_] W [^a-zA-Z0-9_] d [0-9] D [^0-9]
  • 42. Shortcut Character Classes Shortcut Actual Character Class . [^n] s [ tnrfv] S [^ tnrfv] w [a-zA-Z0-9_] W [^a-zA-Z0-9_] d [0-9] D [^0-9]
  • 43. Shortcut Character Classes Shortcut Actual Character Class . [^n] s [ tnrfv] S [^ tnrfv] w [a-zA-Z0-9_] W [^a-zA-Z0-9_] d [0-9] D [^0-9]
  • 44. Shortcut Character Classes Shortcut Actual Character Class . [^n] s [ tnrfv] S [^ tnrfv] w [a-zA-Z0-9_] W [^a-zA-Z0-9_] d [0-9] D [^0-9]
  • 45. Shortcut Character Classes Shortcut Actual Character Class . [^n] s [ tnrfv] S [^ tnrfv] w [a-zA-Z0-9_] W [^a-zA-Z0-9_] d [0-9] D [^0-9]
  • 48. Anchors Anchors match between characters They are used to assert that the content you want must appear in a certain place
  • 49. Anchors Anchors match between characters They are used to assert that the content you want must appear in a certain place Thus /^Totals/ searches for a line starting with “Totals”
  • 50. Anchors Anchor Matches Anchors match between characters A Start of the String End of the String or Z They are used to assert before trailing newline that the content you z End of the String want must appear in a ^ Start of a line certain place $ End of a line Thus /^Totals/ searches Between wW or Ww, b for a line starting with and at A and z “Totals” B Between ww or WW
  • 51. Anchors Anchor Matches Anchors match between characters A Start of the String End of the String or Z They are used to assert before trailing newline that the content you z End of the String want must appear in a ^ Start of a line certain place $ End of a line Thus /^Totals/ searches Between wW or Ww, b for a line starting with and at A and z “Totals” B Between ww or WW
  • 52. Anchors Anchor Matches Anchors match between characters A Start of the String End of the String or Z They are used to assert before trailing newline that the content you z End of the String want must appear in a ^ Start of a line certain place $ End of a line Thus /^Totals/ searches Between wW or Ww, b for a line starting with and at A and z “Totals” B Between ww or WW
  • 53. Anchors Anchor Matches Anchors match between characters A Start of the String End of the String or Z They are used to assert before trailing newline that the content you z End of the String want must appear in a ^ Start of a line certain place $ End of a line Thus /^Totals/ searches Between wW or Ww, b for a line starting with and at A and z “Totals” B Between ww or WW
  • 54. Anchors Anchor Matches Anchors match between characters A Start of the String End of the String or Z They are used to assert before trailing newline that the content you z End of the String want must appear in a ^ Start of a line certain place $ End of a line Thus /^Totals/ searches Between wW or Ww, b for a line starting with and at A and z “Totals” B Between ww or WW
  • 55. Anchors Anchor Matches Anchors match between characters A Start of the String End of the String or Z They are used to assert before trailing newline that the content you z End of the String want must appear in a ^ Start of a line certain place $ End of a line Thus /^Totals/ searches Between wW or Ww, b for a line starting with and at A and z “Totals” B Between ww or WW
  • 56. Anchors Anchor Matches Anchors match between characters A Start of the String End of the String or Z They are used to assert before trailing newline that the content you z End of the String want must appear in a ^ Start of a line certain place $ End of a line Thus /^Totals/ searches Between wW or Ww, b for a line starting with and at A and z “Totals” B Between ww or WW
  • 57. Anchors Anchor Matches Anchors match between characters A Start of the String End of the String or Z They are used to assert before trailing newline that the content you z End of the String want must appear in a ^ Start of a line certain place $ End of a line Thus /^Totals/ searches Between wW or Ww, b for a line starting with and at A and z “Totals” B Between ww or WW
  • 59. Repetition You can tack symbols onto an element of a regex to indicate that element can repeat
  • 60. Repetition You can tack symbols onto an element of a regex to indicate that element can repeat The expression /ab+c?/ matches an a, followed by one or more b’s, and optionally followed by ac
  • 61. Repetition You can tack symbols Repeater Allowed Count onto an element of a ? Zero or one regex to indicate that element can repeat + One or more * Zero or more The expression /ab+c?/ {n} Exactly n matches an a, followed {n,} At least n by one or more b’s, and {,m} No more than m optionally followed by ac {n,m} Between n and m
  • 62. Repetition You can tack symbols Repeater Allowed Count onto an element of a ? Zero or one regex to indicate that element can repeat + One or more * Zero or more The expression /ab+c?/ {n} Exactly n matches an a, followed {n,} At least n by one or more b’s, and {,m} No more than m optionally followed by ac {n,m} Between n and m
  • 63. Repetition You can tack symbols Repeater Allowed Count onto an element of a ? Zero or one regex to indicate that element can repeat + One or more * Zero or more The expression /ab+c?/ {n} Exactly n matches an a, followed {n,} At least n by one or more b’s, and {,m} No more than m optionally followed by ac {n,m} Between n and m
  • 64. Repetition You can tack symbols Repeater Allowed Count onto an element of a ? Zero or one regex to indicate that element can repeat + One or more * Zero or more The expression /ab+c?/ {n} Exactly n matches an a, followed {n,} At least n by one or more b’s, and {,m} No more than m optionally followed by ac {n,m} Between n and m
  • 65. Repetition You can tack symbols Repeater Allowed Count onto an element of a ? Zero or one regex to indicate that element can repeat + One or more * Zero or more The expression /ab+c?/ {n} Exactly n matches an a, followed {n,} At least n by one or more b’s, and {,m} No more than m optionally followed by ac {n,m} Between n and m
  • 66. Repetition You can tack symbols Repeater Allowed Count onto an element of a ? Zero or one regex to indicate that element can repeat + One or more * Zero or more The expression /ab+c?/ {n} Exactly n matches an a, followed {n,} At least n by one or more b’s, and {,m} No more than m optionally followed by ac {n,m} Between n and m
  • 67. Repetition You can tack symbols Repeater Allowed Count onto an element of a ? Zero or one regex to indicate that element can repeat + One or more * Zero or more The expression /ab+c?/ {n} Exactly n matches an a, followed {n,} At least n by one or more b’s, and {,m} No more than m optionally followed by ac {n,m} Between n and m
  • 68. Repetition You can tack symbols Repeater Allowed Count onto an element of a ? Zero or one regex to indicate that element can repeat + One or more * Zero or more The expression /ab+c?/ {n} Exactly n matches an a, followed {n,} At least n by one or more b’s, and {,m} No more than m optionally followed by ac {n,m} Between n and m
  • 70. Some Examples if var =~ /As*z/ puts "Variable is blank." end
  • 71. Some Examples if var =~ /As*z/ puts "Variable is blank." end if var !~ /S/ puts "Variable is blank." end
  • 72. Some Examples if var =~ /As*z/ puts "Variable is blank." end if var !~ /S/ puts "Variable is blank." end From TopCoder.com, SRM 216 “CultureShock:” Bob and Doug have recently moved from Canada to the United States, and they are confused by this strange letter, "ZEE". They need your assistance. Given a String text, replace every occurrence of the word, "ZEE", with the word, "ZED", and return the result. Note that if "ZEE" is just part of a larger word (for example, "ZEES"), it should not be altered.
  • 73. Some Examples if var =~ /As*z/ puts "Variable is blank." end if var !~ /S/ puts "Variable is blank." end From TopCoder.com, SRM 216 “CultureShock:” Bob and Doug have recently moved from Canada to the United States, and they are confused by this strange letter, "ZEE". They need your assistance. Given a String text, replace every occurrence of the word, "ZEE", with the word, "ZED", and return the result. Note that if "ZEE" is just part of a larger word (for example, "ZEES"), it should not be altered. solution = text.gsub(/bZEEb/, "ZED")
  • 75. Greedy Verses Non-Greedy By default repetition will always be greedy, consuming as many characters as possible
  • 76. Greedy Verses Non-Greedy By default repetition will always be greedy, consuming as many characters as possible The match will backtrack, giving up characters, if it helps it succeed
  • 77. Greedy Verses Non-Greedy By default repetition will always be greedy, consuming as many characters as possible The match will backtrack, giving up characters, if it helps it succeed You can negate this, matching minimal characters
  • 78. Greedy Verses Non-Greedy By default repetition will always be greedy, consuming as many Greedy Non-Greedy characters as possible ? ?? + +? The match will * *? backtrack, giving up characters, if it {n} N/A helps it succeed {n,} {n,}? {,m} {,m}? You can negate this, {n,m} {n,m}? matching minimal characters
  • 79. Greedy Verses Non-Greedy By default repetition will always be greedy, consuming as many Greedy Non-Greedy characters as possible ? ?? + +? The match will * *? backtrack, giving up characters, if it {n} N/A helps it succeed {n,} {n,}? {,m} {,m}? You can negate this, {n,m} {n,m}? matching minimal characters
  • 80. Greedy Verses Non-Greedy By default repetition will always be greedy, consuming as many Greedy Non-Greedy characters as possible ? ?? + +? The match will * *? backtrack, giving up characters, if it {n} N/A helps it succeed {n,} {n,}? {,m} {,m}? You can negate this, {n,m} {n,m}? matching minimal characters
  • 81. Greedy Verses Non-Greedy By default repetition will always be greedy, consuming as many Greedy Non-Greedy characters as possible ? ?? + +? The match will * *? backtrack, giving up characters, if it {n} N/A helps it succeed {n,} {n,}? {,m} {,m}? You can negate this, {n,m} {n,m}? matching minimal characters
  • 82. Greedy Verses Non-Greedy By default repetition will always be greedy, consuming as many Greedy Non-Greedy characters as possible ? ?? + +? The match will * *? backtrack, giving up characters, if it {n} N/A helps it succeed {n,} {n,}? {,m} {,m}? You can negate this, {n,m} {n,m}? matching minimal characters
  • 83. Greedy Verses Non-Greedy By default repetition will always be greedy, consuming as many Greedy Non-Greedy characters as possible ? ?? + +? The match will * *? backtrack, giving up characters, if it {n} N/A helps it succeed {n,} {n,}? {,m} {,m}? You can negate this, {n,m} {n,m}? matching minimal characters
  • 84. Greedy Verses Non-Greedy By default repetition will always be greedy, consuming as many Greedy Non-Greedy characters as possible ? ?? + +? The match will * *? backtrack, giving up characters, if it {n} N/A helps it succeed {n,} {n,}? {,m} {,m}? You can negate this, {n,m} {n,m}? matching minimal characters
  • 85. Greedy Verses Non-Greedy By default repetition will always be greedy, consuming as many Greedy Non-Greedy characters as possible ? ?? + +? The match will * *? backtrack, giving up characters, if it {n} N/A helps it succeed {n,} {n,}? {,m} {,m}? You can negate this, {n,m} {n,m}? matching minimal characters
  • 87. Alternation In a regex, | means “or”
  • 88. Alternation In a regex, | means “or” You can put a full expression on the left and another full expression on the right
  • 89. Alternation In a regex, | means “or” You can put a full expression on the left and another full expression on the right Either can match
  • 90. Alternation In a regex, | means “or” You can put a full expression on the left and another full expression on the right Either can match The expression /James|words?/ will match “James,” “word,” or “words”
  • 92. Grouping Everything in ( … ) is grouped into a single element for the purposes of repetition and alternation
  • 93. Grouping Everything in ( … ) is grouped into a single element for the purposes of repetition and alternation The expression /(ha)+/ matches “ha,” “haha,” “hahaha,” etc.
  • 94. Grouping Everything in ( … ) is grouped into a single element for the purposes of repetition and alternation The expression /(ha)+/ matches “ha,” “haha,” “hahaha,” etc. The expression /Greg(ory)?/ matches “Greg” and “Gregory”
  • 96. Captures ( … ) also capture what they match
  • 97. Captures ( … ) also capture what they match After a match, you can access these captures in the variables $1, $2, etc., from left to right
  • 98. Captures ( … ) also capture what they match After a match, you can access these captures in the variables $1, $2, etc., from left to right Use 1, 2, etc. in String replacements
  • 99. Captures ( … ) also capture what they match "$99.95" =~ /$(d+(.d+)?)/ After a match, you can access these captures in the variables $1, $2, etc., from left to right Use 1, 2, etc. in String replacements
  • 100. Captures ( … ) also capture what they match "$99.95" =~ /$(d+(.d+)?)/ After a match, you can access these captures in the variables $1, $2, $1 etc., from left to right Use 1, 2, etc. in String replacements
  • 101. Captures ( … ) also capture what they match "$99.95" =~ /$(d+(.d+)?)/ After a match, you can access these captures in the variables $1, $2, $1 etc., from left to right $2 Use 1, 2, etc. in String replacements
  • 102. Modes
  • 104. Modes Regular expressions have modes End an expression with /i to make the expression case insensitive
  • 105. Modes Regular expressions have modes End an expression with /i to make the expression case insensitive End with /m for “multi-line” mode where . will also match newlines
  • 106. Modes Regular expressions have modes End an expression with /i to make the expression case insensitive End with /m for “multi-line” mode where . will also match newlines Use /x to add space and comments
  • 107. Modes Regular expressions have modes End an expression with /i to make the expression case insensitive End with /m for “multi-line” mode where . will also match newlines Use /x to add space and comments You can combine modes: /mi
  • 109. More Examples if ip =~ /Ad{1,3}(.d{1,3}){3}z/ puts "IP adress is well formed." end
  • 110. More Examples if ip =~ /Ad{1,3}(.d{1,3}){3}z/ puts "IP adress is well formed." end if text =~ /b(at|for|in)[.?!]/ puts "You have bad grammar." end
  • 111. More Examples if ip =~ /Ad{1,3}(.d{1,3}){3}z/ puts "IP adress is well formed." end if text =~ /b(at|for|in)[.?!]/ puts "You have bad grammar." end james_gray = "Gray, James".sub(/(S+),s*(.+)/, '2 1')
  • 113. Other Tricks There are other special variables for regexen including $`, $&, and $’
  • 114. Other Tricks There are other special variables for regexen including $`, $&, and $’ You can escape content for use in a regex
  • 115. Other Tricks There are other special variables for regexen including $`, $&, and $’ You can escape content for use in a regex There’s a MatchData object for matches
  • 116. Other Tricks There are other special variables for regexen including $`, $&, and $’ You can escape content for use in a regex There’s a MatchData object for matches Many methods can take a regex
  • 117. Other Tricks "one_two_three" =~ /two/ There are other special one_, two, _three = $`, $&, $' variables for regexen including $`, $&, and $’ You can escape content for use in a regex There’s a MatchData object for matches Many methods can take a regex
  • 118. Other Tricks "one_two_three" =~ /two/ There are other special one_, two, _three = $`, $&, $' variables for regexen print "What's your favorite language? " including $`, $&, and $’ lang = $stdin.gets.strip if "Perl Java" =~ /b#{Regexp.escape(lang)}b/i puts "You are weird." else You can escape content puts "OK." end for use in a regex There’s a MatchData object for matches Many methods can take a regex
  • 119. Other Tricks "one_two_three" =~ /two/ There are other special one_, two, _three = $`, $&, $' variables for regexen print "What's your favorite language? " including $`, $&, and $’ lang = $stdin.gets.strip if "Perl Java" =~ /b#{Regexp.escape(lang)}b/i puts "You are weird." else You can escape content puts "OK." end for use in a regex CONFIG_RE = /A([^=s]+)s*=s*(S+)/ config = "url = http://ruby-lang.org" There’s a MatchData key, value = config.match(CONFIG_RE).captures object for matches Many methods can take a regex
  • 120. Other Tricks "one_two_three" =~ /two/ There are other special one_, two, _three = $`, $&, $' variables for regexen print "What's your favorite language? " including $`, $&, and $’ lang = $stdin.gets.strip if "Perl Java" =~ /b#{Regexp.escape(lang)}b/i puts "You are weird." else You can escape content puts "OK." end for use in a regex CONFIG_RE = /A([^=s]+)s*=s*(S+)/ config = "url = http://ruby-lang.org" There’s a MatchData key, value = config.match(CONFIG_RE).captures object for matches fields = "1|2 | 3".split(/s*|s*/) last_word_i = "one two three".rindex(/bw+/) Many methods can take five = "Count: 5"[/d+/] a regex five = "Count: 5"[/Count:s*(d+)/, 1]
  • 121. Quiz
  • 122. Which of These Does not Match?
  • 123. Which of These Does not Match? /wreck(ed)?b/i “wreck” “Wreck” “wrecked” “wrecks” “shipwreck” “That’s a wreck.”
  • 124. Which of These Does not Match? /wreck(ed)?b/i “wreck” “Wreck” “wrecked” “wrecks” “shipwreck” “That’s a wreck.”
  • 125. What Does This Match?
  • 126. What Does This Match? /A([1-9]|[1-9]d|[12]dd)z/
  • 127. What Does This Match? /A([1-9]|[1-9]d|[12]dd)z/ A number between 1 and 299.
  • 129. What’s in $1? path = "/blogs/3/comments/7/rating.json" path =~ /(w+)(.w+)?Z/
  • 130. What’s in $1? path = "/blogs/3/comments/7/rating.json" path =~ /(w+)(.w+)?Z/ “rating”
  • 133. Regular Expression Extensions Ruby’s regex engine adds several common extensions
  • 134. Regular Expression Extensions Ruby’s regex engine adds several common extensions These usually look something like (? … )
  • 135. Regular Expression Extensions Ruby’s regex engine adds several common extensions These usually look something like (? … ) The simplest is (?: … ) which is grouping without capturing
  • 136. Regular Expression Extensions Ruby’s regex engine adds several common data = "put the ball in the sack" extensions re = %r{ (put|set) # verb: $1 s+ # some space (/x safe) (?:(?:the|a)s+)? # an article (optional) These usually look (w+) # noun: $2 s+ something like (?:in(?:side)?)? # preposition (optional) s+ (? … ) (?:(?:the|a)s+)? (w+) # noun: $3 }x p data =~ re The simplest is (?: … ) p [$1, $2, $3] which is grouping without capturing
  • 138. Look-Around Assertions You can use look-ahead assertions to peek ahead without consuming characters:
  • 139. Look-Around Assertions You can use look-ahead assertions to peek ahead without consuming characters: (?= … ) and (?! … )
  • 140. Look-Around Assertions You can use look-ahead assertions to peek ahead without consuming characters: (?= … ) and (?! … ) Ruby 1.9 adds a fixed look-behind:
  • 141. Look-Around Assertions You can use look-ahead assertions to peek ahead without consuming characters: (?= … ) and (?! … ) Ruby 1.9 adds a fixed look-behind: (?<= … ) and (?<! … )
  • 142. Look-Around Assertions You can use look-ahead assertions to peek ahead without consuming characters: class Numeric def commify (?= … ) and (?! … ) to_s.reverse. gsub(/(ddd)(?=d)(?!d*.)/, '1,'). reverse end Ruby 1.9 adds a fixed end look-behind: (?<= … ) and (?<! … )
  • 144. Oniguruma Ruby 1.9’s regex engine is faster and more powerful:
  • 145. Oniguruma Ruby 1.9’s regex engine is faster and more powerful: Named groups
  • 146. Oniguruma Ruby 1.9’s regex engine is faster and more powerful: Named groups Nested matching
  • 147. Oniguruma Ruby 1.9’s regex engine is faster and more powerful: Named groups Nested matching Improved encodings
  • 148. Oniguruma Ruby 1.9’s regex engine is faster and more powerful: Named groups Nested matching Improved encodings And more…
  • 149. Oniguruma Ruby 1.9’s regex engine is faster and more config = "mode = wrap" if /A(?<key>w+)s*=s*(?<value>w+)/ =~ config powerful: puts "Key is #{key} and value is #{value}" end Named groups Nested matching Improved encodings And more…
  • 150. Oniguruma Ruby 1.9’s regex engine is faster and more config = "mode = wrap" if /A(?<key>w+)s*=s*(?<value>w+)/ =~ config powerful: puts "Key is #{key} and value is #{value}" end Named groups CHECK = /A(?<paren>((g<paren>|[^()])*?))z/ %w[ () (()()) Nested matching (a(b(c,d()))) ()) ].each do |test| unless test =~ CHECK puts "#{test} isn't balanced" Improved encodings end end And more…
  • 153. The Data data = <<END_FIELDS.gsub(/s+/, " ") Business Name (Text Field), Allows Pets (Check), Open To (Dropdown: Men, Women, Children, Any), Atmosphere (Check List: Calm, Romantic, New Age) END_FIELDS
  • 155. A First Pass data = <<END_FIELDS.gsub(/s+/, " ") Business Name (Text Field), Allows Pets (Check), Open To (Dropdown: Men, Women, Children, Any), Atmosphere (Check List: Calm, Romantic, New Age) END_FIELDS data.scan(/([^,s(][^,(]*?)s*(([^)]+))/) do |name, kind| p [name, kind] end __END__ ["Business Name", "Text Field"] ["Allows Pets", "Check"] ["Open To", "Dropdown: Men, Women, Children, Any"] ["Atmosphere", "Check List: Calm, Romantic, New Age"]
  • 156. A First Pass data = <<END_FIELDS.gsub(/s+/, " ") Business Name (Text Field), Allows Pets (Check), Open To (Dropdown: Men, Women, Children, Any), Atmosphere (Check List: Calm, Romantic, New Age) END_FIELDS data.scan(/([^,s(][^,(]*?)s*(([^)]+))/) do |name, kind| p [name, kind] end __END__ ["Business Name", "Text Field"] ["Allows Pets", "Check"] ["Open To", "Dropdown: Men, Women, Children, Any"] ["Atmosphere", "Check List: Calm, Romantic, New Age"]
  • 157. A First Pass data = <<END_FIELDS.gsub(/s+/, " ") Business Name (Text Field), Allows Pets (Check), Open To (Dropdown: Men, Women, Children, Any), Atmosphere (Check List: Calm, Romantic, New Age) END_FIELDS data.scan(/([^,s(][^,(]*?)s*(([^)]+))/) do |name, kind| p [name, kind] end __END__ ["Business Name", "Text Field"] ["Allows Pets", "Check"] ["Open To", "Dropdown: Men, Women, Children, Any"] ["Atmosphere", "Check List: Calm, Romantic, New Age"]
  • 158. Parsed
  • 159. Parsed data = <<END_FIELDS.gsub(/s+/, " ") Business Name (Text Field), Allows Pets (Check), Open To (Dropdown: Men, Women, Children, Any), Atmosphere (Check List: Calm, Romantic, New Age) END_FIELDS data.scan(/([^,s(][^,(]*?)s*(([^):]+)(?::s*([^)]+?)s*)?)/) do |name, kind, fields| p [name, kind, fields] end __END__ ["Business Name", "Text Field", nil] ["Allows Pets", "Check", nil] ["Open To", "Dropdown", "Men, Women, Children, Any"] ["Atmosphere", "Check List", "Calm, Romantic, New Age"]
  • 160. Parsed data = <<END_FIELDS.gsub(/s+/, " ") Business Name (Text Field), Allows Pets (Check), Open To (Dropdown: Men, Women, Children, Any), Atmosphere (Check List: Calm, Romantic, New Age) END_FIELDS data.scan(/([^,s(][^,(]*?)s*(([^):]+)(?::s*([^)]+?)s*)?)/) do |name, kind, fields| p [name, kind, fields] end __END__ ["Business Name", "Text Field", nil] ["Allows Pets", "Check", nil] ["Open To", "Dropdown", "Men, Women, Children, Any"] ["Atmosphere", "Check List", "Calm, Romantic, New Age"]
  • 161. Parsed data = <<END_FIELDS.gsub(/s+/, " ") Business Name (Text Field), Allows Pets (Check), Open To (Dropdown: Men, Women, Children, Any), Atmosphere (Check List: Calm, Romantic, New Age) END_FIELDS data.scan(/([^,s(][^,(]*?)s*(([^):]+)(?::s*([^)]+?)s*)?)/) do |name, kind, fields| p [name, kind, fields] end __END__ ["Business Name", "Text Field", nil] ["Allows Pets", "Check", nil] ["Open To", "Dropdown", "Men, Women, Children, Any"] ["Atmosphere", "Check List", "Calm, Romantic, New Age"]