Beyond the Basics: Regular Expressions in Ruby

1,537 views

Published on

Many of us approach regular expressions with a certain fear and trepidation, using them only when absolutely necessary. We can get by when we need to use them, but we hesitate to dive any deeper into their cryptic world. Ruby has so much more to offer us through the Regexp class and the Oniguruma regular expression library. This talk will showcase advanced regular expression techniques include grouping, lookahead and lookbehinds. Are your regular expressions greedy, lazy, or possessive? Learn how to change their behavior for the better. You will walk away with tools and techniques to harness the power of Ruby regular expressions and bring beauty and elegance into your code.

Published in: Technology

Beyond the Basics: Regular Expressions in Ruby

  1. 1. BEYOND THE BASICS Regular Expressions in Ruby @nellshamrellThursday, February 7, 13
  2. 2. ^4[0-9]{12}(?:[0-9]{3})?$ Source: regular-expressions.infoThursday, February 7, 13
  3. 3. Regular Expressions are PatternsThursday, February 7, 13
  4. 4. Test Extract ChangeThursday, February 7, 13
  5. 5. Test Extract ChangeThursday, February 7, 13
  6. 6. Test Extract ChangeThursday, February 7, 13
  7. 7. In Ruby, regular expressions are objectsThursday, February 7, 13
  8. 8. You program from a regular expression to a result Source: The Well Grounded RubyistThursday, February 7, 13
  9. 9. OnigurumaThursday, February 7, 13
  10. 10. Shorthand for Hexadecimals /h and /HThursday, February 7, 13
  11. 11. OnigmoThursday, February 7, 13
  12. 12. Thursday, February 7, 13
  13. 13. =~Thursday, February 7, 13
  14. 14. /force/ =~ “Use the force”Thursday, February 7, 13
  15. 15. “Use the force” =~ /force/Thursday, February 7, 13
  16. 16. “Use the force” =~ /force/ => 8Thursday, February 7, 13
  17. 17. /dark side/ !~ “Use the force”Thursday, February 7, 13
  18. 18. /dark side/ !~ “Use the force” => trueThursday, February 7, 13
  19. 19. MatchDataThursday, February 7, 13
  20. 20. .matchThursday, February 7, 13
  21. 21. string = “The force will be with you always.”Thursday, February 7, 13
  22. 22. string = “The force will be with you always.” m =/force/.match(string)Thursday, February 7, 13
  23. 23. string = “The force will be with you always.” m =/force/.match(string) => #<MatchData “force” >Thursday, February 7, 13
  24. 24. string = “The force will be with you always.” m =/force/.match(string,5)Thursday, February 7, 13
  25. 25. string = “The force will be with you always.” m =/force/.match(string,5) => nilThursday, February 7, 13
  26. 26. What can you do with MatchData?Thursday, February 7, 13
  27. 27. m.to_sThursday, February 7, 13
  28. 28. m.to_s => “force”Thursday, February 7, 13
  29. 29. m.to_s => “force” m.pre_matchThursday, February 7, 13
  30. 30. m.to_s => “force” m.pre_match => “The ”Thursday, February 7, 13
  31. 31. m.to_s => “force” m.pre_match => “The ” m.post_matchThursday, February 7, 13
  32. 32. m.to_s => “force” m.pre_match => “The ” m.post_match => “ be with you”Thursday, February 7, 13
  33. 33. Capture GroupsThursday, February 7, 13
  34. 34. /(.*)force(.*)/Thursday, February 7, 13
  35. 35. m = /(.*)force(.*)/.match(string)Thursday, February 7, 13
  36. 36. m = /(.*)force(.*)/.match(string) m.capturesThursday, February 7, 13
  37. 37. m = /(.*)force(.*)/.match(string) m.captures => [“The ”, “will be with you always”]Thursday, February 7, 13
  38. 38. You access capture groups with []Thursday, February 7, 13
  39. 39. m[1]Thursday, February 7, 13
  40. 40. m[1] => “The ”Thursday, February 7, 13
  41. 41. m[1] => “The ” m[2]Thursday, February 7, 13
  42. 42. m[1] => “The ” m[2] => “ will be with you always ”Thursday, February 7, 13
  43. 43. m[0]Thursday, February 7, 13
  44. 44. m[0] => “The force will be with you always.”Thursday, February 7, 13
  45. 45. Match Objects are not arraysThursday, February 7, 13
  46. 46. m.each do |match| puts match.upcase endThursday, February 7, 13
  47. 47. m.each do |match| puts match.upcase end => NoMethodErrorThursday, February 7, 13
  48. 48. m.to_a.each do |match| puts match.upcase endThursday, February 7, 13
  49. 49. m.to_a.each do |match| puts match.upcase end => “THE FORCE WILL BE WITH YOU ALWAYS” “THE ” “WILL BE WITH YOU ALWAYS”Thursday, February 7, 13
  50. 50. Thursday, February 7, 13
  51. 51. LookArounds Define ContextThursday, February 7, 13
  52. 52. string = “Who’s the more foolish? The fool or the fool who follows him?”Thursday, February 7, 13
  53. 53. string = “Who’s the more foolish? The fool or the fool who follows him?” /fool/Thursday, February 7, 13
  54. 54. string.scan(/fool/)Thursday, February 7, 13
  55. 55. string.scan(/fool/) => [“fool”, “fool”, “fool”]Thursday, February 7, 13
  56. 56. Positive LookaheadThursday, February 7, 13
  57. 57. ?=Thursday, February 7, 13
  58. 58. string.scan(/fool(?=ish)/)Thursday, February 7, 13
  59. 59. string.scan(/fool(?=ish)/) => [“fool”]Thursday, February 7, 13
  60. 60. string.gsub(/fool(?=ish)/, “self”)Thursday, February 7, 13
  61. 61. string.gsub(/fool(?=ish)/, “self”) => “Who’s the more selfish? The fool or the fool who follows him?”Thursday, February 7, 13
  62. 62. Zero Width Positive Lookahead AssertionThursday, February 7, 13
  63. 63. Zero width means it does not consume charactersThursday, February 7, 13
  64. 64. Positive means a match for the lookahead should be presentThursday, February 7, 13
  65. 65. Lookahead means it is looking ahead of your main match.Thursday, February 7, 13
  66. 66. Assertion means the lookahead only determines whether a match existsThursday, February 7, 13
  67. 67. string = “Who’s the more foolish? The fool or the fool who follows him?”Thursday, February 7, 13
  68. 68. Negative LookaheadThursday, February 7, 13
  69. 69. Negative means a match for the lookahead should not be presentThursday, February 7, 13
  70. 70. ?!Thursday, February 7, 13
  71. 71. string.scan(/fool(?!ish)/)Thursday, February 7, 13
  72. 72. string.scan(/fool(?!ish)/) => [“fool”, “fool”]Thursday, February 7, 13
  73. 73. string.gsub(/fool(?!ish)/, “self”)Thursday, February 7, 13
  74. 74. string.gsub(/fool(?!ish)/, “self”) => “Who’s the more foolish? The self or the self who follows him?”Thursday, February 7, 13
  75. 75. Positive LookbehindThursday, February 7, 13
  76. 76. string = “For my ally is the force, and a powerful ally it is”Thursday, February 7, 13
  77. 77. string = “For my ally is the force, and a powerful ally it is” /ally/Thursday, February 7, 13
  78. 78. ?<=Thursday, February 7, 13
  79. 79. /(?<=powerful )ally/Thursday, February 7, 13
  80. 80. string.gsub(/(?<=powerful )ally/, “friend”)Thursday, February 7, 13
  81. 81. string.gsub(/(?<=powerful )ally/, “friend”) => “For my ally is the force, and a powerful friend it is”Thursday, February 7, 13
  82. 82. Negative LookbehindThursday, February 7, 13
  83. 83. ?<!Thursday, February 7, 13
  84. 84. /(?<!powerful )ally/Thursday, February 7, 13
  85. 85. string.gsub(/(?<!powerful )ally/, “friend”)Thursday, February 7, 13
  86. 86. string.gsub(/(?<!powerful )ally/, “friend”) => “For my friend is the force, and a powerful ally it is”Thursday, February 7, 13
  87. 87. Thursday, February 7, 13
  88. 88. Regular Expressions have distinct behaviorsThursday, February 7, 13
  89. 89. Greedy Lazy PossessiveThursday, February 7, 13
  90. 90. Greedy Lazy PossessiveThursday, February 7, 13
  91. 91. Greedy Lazy PossessiveThursday, February 7, 13
  92. 92. QuantifiersThursday, February 7, 13
  93. 93. +Thursday, February 7, 13
  94. 94. + /.+/Thursday, February 7, 13
  95. 95. Quantifiers are greedy by defaultThursday, February 7, 13
  96. 96. Greedy Quantifiers match as much as possibleThursday, February 7, 13
  97. 97. Greedy Quantifiers use maximum effort for maximum returnThursday, February 7, 13
  98. 98. string = “This is no time to talk about time we don’t have the time”Thursday, February 7, 13
  99. 99. string = “This is no time to talk about time we don’t have the time” /.+time/Thursday, February 7, 13
  100. 100. /.+time/.match(string)Thursday, February 7, 13
  101. 101. /.+time/.match(string) => “This is no time to talk about time we don’t have the time”Thursday, February 7, 13
  102. 102. Greedy regular expressions try to match the whole string, then backtrackThursday, February 7, 13
  103. 103. Oniguruma makes backtracking quickerThursday, February 7, 13
  104. 104. Lazy QuantifiersThursday, February 7, 13
  105. 105. Lazy Quantifiers match as little as possibleThursday, February 7, 13
  106. 106. Lazy Quantifiers use minimum effort for minimum returnThursday, February 7, 13
  107. 107. /.+?time/Thursday, February 7, 13
  108. 108. /.+?time/.match(string)Thursday, February 7, 13
  109. 109. /.+?time/.match(string) => “This is no time”Thursday, February 7, 13
  110. 110. Lazy regular expressions use less resourcesThursday, February 7, 13
  111. 111. Possessive QuantifiersThursday, February 7, 13
  112. 112. Possessive Quantifiers are all or nothingThursday, February 7, 13
  113. 113. Possessive Quantifiers try to match the entire string with no backtrackingThursday, February 7, 13
  114. 114. Possessive Quantifiers use minimum effort for maximum returnThursday, February 7, 13
  115. 115. /.++time/Thursday, February 7, 13
  116. 116. /.++time/.match(string)Thursday, February 7, 13
  117. 117. /.++time/.match(string) => nilThursday, February 7, 13
  118. 118. Possessive Quantifiers fail fasterThursday, February 7, 13
  119. 119. Thursday, February 7, 13
  120. 120. Write regular expressions in small chunksThursday, February 7, 13
  121. 121. RubularThursday, February 7, 13
  122. 122. Regular expressions come in draftsThursday, February 7, 13
  123. 123. Move beyond your fear of regular expressionsThursday, February 7, 13
  124. 124. Nell Shamrell Software Development Engineer Blue Box Group @nellshamrellThursday, February 7, 13

×