BEYOND THE BASICS
                           Regular Expressions in Ruby

                                 @nellshamrell




Thursday, February 7, 13
^4[0-9]{12}(?:[0-9]{3})?$



                           Source: regular-expressions.info

Thursday, February 7, 13
Regular Expressions are Patterns




Thursday, February 7, 13
Test

                       Extract

                       Change


Thursday, February 7, 13
Test

                           Extract

                       Change


Thursday, February 7, 13
Test

                           Extract

                       Change


Thursday, February 7, 13
In Ruby, regular
                           expressions are objects




Thursday, February 7, 13
You program from a regular
                        expression to a result



                               Source: The Well Grounded Rubyist

Thursday, February 7, 13
Oniguruma




Thursday, February 7, 13
Shorthand for Hexadecimals

                              /h and /H




Thursday, February 7, 13
Onigmo




Thursday, February 7, 13
Thursday, February 7, 13
=~

Thursday, February 7, 13
/force/ =~ “Use the force”




Thursday, February 7, 13
“Use the force” =~ /force/




Thursday, February 7, 13
“Use the force” =~ /force/

        => 8




Thursday, February 7, 13
/dark side/ !~ “Use the force”




Thursday, February 7, 13
/dark side/ !~ “Use the force”

        => true




Thursday, February 7, 13
MatchData




Thursday, February 7, 13
.match




Thursday, February 7, 13
string = “The force will be with
                           you always.”




Thursday, February 7, 13
string = “The force will be with
                           you always.”
           m =/force/.match(string)




Thursday, February 7, 13
string = “The force will be with
                           you always.”
           m =/force/.match(string)

            => #<MatchData “force” >



Thursday, February 7, 13
string = “The force will be with
                           you always.”
           m =/force/.match(string,5)




Thursday, February 7, 13
string = “The force will be with
                           you always.”
           m =/force/.match(string,5)

            => nil



Thursday, February 7, 13
What can you do with MatchData?




Thursday, February 7, 13
m.to_s




Thursday, February 7, 13
m.to_s
       => “force”




Thursday, February 7, 13
m.to_s
       => “force”

       m.pre_match




Thursday, February 7, 13
m.to_s
       => “force”

       m.pre_match
       => “The ”




Thursday, February 7, 13
m.to_s
       => “force”

       m.pre_match
       => “The ”

       m.post_match

Thursday, February 7, 13
m.to_s
       => “force”

       m.pre_match
       => “The ”

       m.post_match
       => “ be with you”
Thursday, February 7, 13
Capture Groups




Thursday, February 7, 13
/(.*)force(.*)/




Thursday, February 7, 13
m = /(.*)force(.*)/.match(string)




Thursday, February 7, 13
m = /(.*)force(.*)/.match(string)

           m.captures




Thursday, February 7, 13
m = /(.*)force(.*)/.match(string)

           m.captures

           => [“The ”,

                           “will be with you always”]

Thursday, February 7, 13
You access capture groups with []




Thursday, February 7, 13
m[1]




Thursday, February 7, 13
m[1]
       => “The ”




Thursday, February 7, 13
m[1]
       => “The ”

       m[2]




Thursday, February 7, 13
m[1]
       => “The ”

       m[2]
       => “ will be with you always ”




Thursday, February 7, 13
m[0]




Thursday, February 7, 13
m[0]
       => “The force will be with
          you always.”




Thursday, February 7, 13
Match Objects are not arrays




Thursday, February 7, 13
m.each do |match|
             puts match.upcase
           end




Thursday, February 7, 13
m.each do |match|
             puts match.upcase
           end
           => NoMethodError




Thursday, February 7, 13
m.to_a.each do |match|
             puts match.upcase
           end




Thursday, February 7, 13
m.to_a.each do |match|
             puts match.upcase
           end
           => “THE FORCE WILL BE
              WITH YOU ALWAYS”
                           “THE ”
                           “WILL BE WITH YOU
                           ALWAYS”
Thursday, February 7, 13
Thursday, February 7, 13
LookArounds Define Context




Thursday, February 7, 13
string = “Who’s the more
                      foolish? The fool or
                      the fool who follows
                      him?”




Thursday, February 7, 13
string = “Who’s the more
                      foolish? The fool or
                      the fool who follows
                      him?”

             /fool/



Thursday, February 7, 13
string.scan(/fool/)




Thursday, February 7, 13
string.scan(/fool/)

          => [“fool”, “fool”, “fool”]




Thursday, February 7, 13
Positive Lookahead




Thursday, February 7, 13
?=




Thursday, February 7, 13
string.scan(/fool(?=ish)/)




Thursday, February 7, 13
string.scan(/fool(?=ish)/)

          => [“fool”]




Thursday, February 7, 13
string.gsub(/fool(?=ish)/, “self”)




Thursday, February 7, 13
string.gsub(/fool(?=ish)/, “self”)

          => “Who’s the more
                           selfish? The fool or
                           the fool who follows
                           him?”



Thursday, February 7, 13
Zero Width Positive
                           Lookahead Assertion




Thursday, February 7, 13
Zero width means it does not
                         consume characters




Thursday, February 7, 13
Positive means a match for the
                 lookahead should be present




Thursday, February 7, 13
Lookahead means it is looking
                   ahead of your main match.




Thursday, February 7, 13
Assertion means the lookahead only
    determines whether a match exists




Thursday, February 7, 13
string = “Who’s the more
                      foolish? The fool or
                      the fool who follows
                      him?”




Thursday, February 7, 13
Negative Lookahead




Thursday, February 7, 13
Negative means a match for the
         lookahead should not be present




Thursday, February 7, 13
?!




Thursday, February 7, 13
string.scan(/fool(?!ish)/)




Thursday, February 7, 13
string.scan(/fool(?!ish)/)

          => [“fool”, “fool”]




Thursday, February 7, 13
string.gsub(/fool(?!ish)/, “self”)




Thursday, February 7, 13
string.gsub(/fool(?!ish)/, “self”)

          => “Who’s the more
                           foolish? The self or
                           the self who follows
                           him?”



Thursday, February 7, 13
Positive Lookbehind




Thursday, February 7, 13
string = “For my ally is the
                      force, and a powerful
                      ally it is”




Thursday, February 7, 13
string = “For my ally is the
                      force, and a powerful
                      ally it is”


             /ally/



Thursday, February 7, 13
?<=




Thursday, February 7, 13
/(?<=powerful )ally/




Thursday, February 7, 13
string.gsub(/(?<=powerful )ally/,
        “friend”)




Thursday, February 7, 13
string.gsub(/(?<=powerful )ally/,
        “friend”)
        => “For my ally is the
                           force, and a powerful
                           friend it is”




Thursday, February 7, 13
Negative Lookbehind




Thursday, February 7, 13
?<!




Thursday, February 7, 13
/(?<!powerful )ally/




Thursday, February 7, 13
string.gsub(/(?<!powerful )ally/,
        “friend”)




Thursday, February 7, 13
string.gsub(/(?<!powerful )ally/,
        “friend”)
        => “For my friend is the
                           force, and a powerful
                           ally it is”




Thursday, February 7, 13
Thursday, February 7, 13
Regular Expressions have
                              distinct behaviors




Thursday, February 7, 13
Greedy

                           Lazy

                       Possessive


Thursday, February 7, 13
Greedy

                           Lazy

                       Possessive


Thursday, February 7, 13
Greedy

                           Lazy

                           Possessive


Thursday, February 7, 13
Quantifiers




Thursday, February 7, 13
+


Thursday, February 7, 13
+
                           /.+/

Thursday, February 7, 13
Quantifiers are greedy by default




Thursday, February 7, 13
Greedy Quantifiers match
                             as much as possible




Thursday, February 7, 13
Greedy Quantifiers use maximum
         effort for maximum return




Thursday, February 7, 13
string = “This is no time to
                      talk about time we
                      don’t have the time”




Thursday, February 7, 13
string = “This is no time to
                      talk about time we
                      don’t have the time”


             /.+time/



Thursday, February 7, 13
/.+time/.match(string)




Thursday, February 7, 13
/.+time/.match(string)

          => “This is no time to
                           talk about time we
                           don’t have the time”




Thursday, February 7, 13
Greedy regular expressions
                             try to match the whole
                            string, then backtrack




Thursday, February 7, 13
Oniguruma makes
                           backtracking quicker




Thursday, February 7, 13
Lazy Quantifiers




Thursday, February 7, 13
Lazy Quantifiers match
                            as little as possible




Thursday, February 7, 13
Lazy Quantifiers use minimum
              effort for minimum return




Thursday, February 7, 13
/.+?time/




Thursday, February 7, 13
/.+?time/.match(string)




Thursday, February 7, 13
/.+?time/.match(string)

          => “This is no time”




Thursday, February 7, 13
Lazy regular expressions
                              use less resources




Thursday, February 7, 13
Possessive Quantifiers




Thursday, February 7, 13
Possessive Quantifiers are
                                all or nothing




Thursday, February 7, 13
Possessive Quantifiers try to
                        match the entire string
                       with no backtracking




Thursday, February 7, 13
Possessive Quantifiers use
                             minimum effort for
                              maximum return




Thursday, February 7, 13
/.++time/




Thursday, February 7, 13
/.++time/.match(string)




Thursday, February 7, 13
/.++time/.match(string)

          => nil




Thursday, February 7, 13
Possessive Quantifiers
                                fail faster




Thursday, February 7, 13
Thursday, February 7, 13
Write regular expressions
                               in small chunks




Thursday, February 7, 13
Rubular




Thursday, February 7, 13
Regular expressions come in drafts




Thursday, February 7, 13
Move beyond your fear
                           of regular expressions




Thursday, February 7, 13
Nell Shamrell
             Software Development Engineer
                           Blue Box Group
                           @nellshamrell


Thursday, February 7, 13

Beyond the Basics: Regular Expressions in Ruby