Beneath the Surface - Rubyconf 2013

1,757 views

Published on

This is the final version of this talk, given at RubyConf 2013

Many of us approach regular expressions with a certain fear and trepidation, using them only when absolutely necessary. We can get by when we need to use them, but we hesitate to dive any deeper into their cryptic world. Ruby has so much more to offer us. This talk showcases the incredible power of Ruby and the Onigmo regex library

Ruby runs on. It takes you on a journey beneath the surface, exploring the beauty, elegance, and power of regular expressions. You will discover the flexible, dynamic, and eloquent ways to harness this beauty and power in your own code.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,757
On SlideShare
0
From Embeds
0
Number of Embeds
234
Actions
Shares
0
Downloads
20
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Beneath the Surface - Rubyconf 2013

  1. 1. Beneath the Surface Regular Expressions in Ruby @nellshamrell Photo By Mr. Christopher Thomas Creative Commons Attribution-ShareALike 2.0 Generic License
  2. 2. ^4[0-9]{12}(?:[0-9]{3})?$ Source: regular-expressions.info
  3. 3. We fear what we do not understand
  4. 4. Regular Expressions + Ruby Photo By Shayan Creative Commons Attribution-ShareALike 2.0 Generic License
  5. 5. Regex Matching in Ruby Ruby Methods Onigmo
  6. 6. Onigmo
  7. 7. Oniguruma Fork Onigmo
  8. 8. Onigmo Reads Regex
  9. 9. Onigmo Reads Regex Parses Into Abstract Syntax Tree
  10. 10. Onigmo Series of Instructions Reads Regex Parses Into Compiles Into Abstract Syntax Tree
  11. 11. Finite State Machines Photo By Felipe Skroski Creative Commons Attribution Generic 2.0
  12. 12. A Finite State Machine Shows How Something Works
  13. 13. Annie the Dog
  14. 14. In the House Out of House Annie the Dog
  15. 15. Door In the House Out of House Annie the Dog
  16. 16. Door In the House Door Out of House Annie the Dog
  17. 17. Finite State Machine
  18. 18. Finite State Machine
  19. 19. Finite State Machine
  20. 20. Multiple States
  21. 21. /force/
  22. 22. re = /force/ string = “Use the force” re.match(string)
  23. 23. “Use the force” f o r Path Doesn’t Match /force/ c e
  24. 24. “Use the force” f o r Still Doesn’t Match /force/ c e
  25. 25. “Use the force” f o (Fast Forward) r Path Matches! /force/ c e
  26. 26. “Use the force” f o r /force/ c e
  27. 27. “Use the force” f o r /force/ c e
  28. 28. “Use the force” f o r /force/ c e
  29. 29. “Use the force” f o r /force/ c e
  30. 30. “Use the force” f o r c We Have A Match! /force/ e
  31. 31. re = /force/ string = “Use the force” re.match(string) => #<MatchData “force”>
  32. 32. Alternation Photo By Shayan Creative Commons Attribution Generic 2.0
  33. 33. Pipe /Y(olk|oda)/
  34. 34. re = /Y(olk|oda)/ string = “Yoda” re.match(string)
  35. 35. “Yoda” o Y o l k d a /Y(olk|oda)/
  36. 36. Which To Choose? “Yoda” o Y o l k d a /Y(olk|oda)/
  37. 37. Saves To Backtrack Stack “Yoda” o Y o l k d a /Y(olk|oda)/
  38. 38. Uh Oh, No Match “Yoda” o Y o l k d a /Y(olk|oda)/
  39. 39. Backtracks To Here “Yoda” o Y o l k d a /Y(olk|oda)/
  40. 40. “Yoda” o Y o l k d a /Y(olk|oda)/
  41. 41. “Yoda” o Y o l k d a /Y(olk|oda)/
  42. 42. “Yoda” o Y o l k d a We Have A Match! /Y(olk|oda)/
  43. 43. re = /Y(olk|oda)/ string = “Yoda” re.match(string) => #<MatchData “Yoda”>
  44. 44. Quantifiers Photo By Fancy Horse Creative Commons Attribution Generic 2.0
  45. 45. Plus Quantifier /No+/
  46. 46. re = /No+/ string = “Noooo” re.match(string)
  47. 47. “Noooo” o N o /No+/
  48. 48. “Noooo” o N o /No+/
  49. 49. “Noooo” o N o Return Match? Or Keep Looping? /No+/
  50. 50. “Noooo” o N o Greedy Quantifier /No+/ Keeps Looping
  51. 51. Greedy quantifiers match as much as possible
  52. 52. Greedy quantifiers use maximum effort for maximum return
  53. 53. “Noooo” o N o /No+/
  54. 54. “Noooo” o N o /No+/
  55. 55. “Noooo” o N o We Have A Match! /No+/
  56. 56. re = /No+/ string = “Noooo” re.match(string) => #<MatchData “Noooo”>
  57. 57. Lazy Quantifiers
  58. 58. Lazy quantifiers match as little as possible
  59. 59. Lazy quantifiers use minimum effort for minimum return
  60. 60. Makes Quantifier Lazy /No+?/
  61. 61. re = /No+?/ string = “Noooo” re.match(string)
  62. 62. “Noooo” o N o /No+?/
  63. 63. “Noooo” o N o /No+?/
  64. 64. “Noooo” o N o Return Match? Or Keep Looping? /No+?/
  65. 65. “Noooo” o N o We Have A Match! /No+?/
  66. 66. re = /No+?/ string = “Noooo” re.match(string) => #<MatchData “No”>
  67. 67. Greedy quantifiers are greedy but reasonable
  68. 68. Star Quantifier /.*moon/
  69. 69. re = /.*moon/ string = “That’s no moon” re.match(string)
  70. 70. “That’s no moon” . m o o . /.*moon/ n
  71. 71. “That’s no moon” . m o o . /.*moon/ n
  72. 72. “That’s no moon” . m . o o Loops /.*moon/ n
  73. 73. “That’s no moon” . m . (Fast Forward) o o Which To Match? /.*moon/ n
  74. 74. “That’s no moon” . m . o o Keeps Looping /.*moon/ n
  75. 75. “That’s no moon” . m . o o Keeps Looping /.*moon/ n
  76. 76. “That’s no moon” . m . o o Keeps Looping /.*moon/ n
  77. 77. “That’s no moon” . m o No More Characters? o . /.*moon/ n
  78. 78. “That’s no moon” . m . o o n Backtrack or Fail? /.*moon/
  79. 79. “That’s no moon” . m Backtracks o o . /.*moon/ n
  80. 80. “That’s no moon” . m Backtracks o o . /.*moon/ n
  81. 81. “That’s no moon” . m Backtracks o o . /.*moon/ n
  82. 82. “That’s no moon” . Backtracks m . o o Huzzah! /.*moon/ n
  83. 83. “That’s no moon” . m o o . /.*moon/ n
  84. 84. “That’s no moon” . m o o . /.*moon/ n
  85. 85. “That’s no moon” . m o o . /.*moon/ n
  86. 86. “That’s no moon” . m o o . n We Have A Match! /.*moon/
  87. 87. re = /.*moon/ string = “That’s no moon” re.match(string) => #<MatchData “That’s no moon”>
  88. 88. Backtracking = Slow
  89. 89. /No+w+/
  90. 90. re = /No+w+/ string = “Noooo” re.match(string)
  91. 91. “Noooo” o N o w /No+w+/ w
  92. 92. “Noooo” o N o w /No+w+/ w
  93. 93. “Noooo” o Loops N o w /No+w+/ w
  94. 94. “Noooo” o Loops N o w /No+w+/ w
  95. 95. “Noooo” o Loops N o w /No+w+/ w
  96. 96. “Noooo” o N o w /No+w+/ Uh Oh w
  97. 97. “Noooo” o N o Uh Oh w w Backtrack or Fail? /No+w+/
  98. 98. “Noooo” Backtracks N o o w /No+w+/ w
  99. 99. “Noooo” o Backtracks N o w /No+w+/ w
  100. 100. “Noooo” o Backtracks N o w /No+w+/ w
  101. 101. “Noooo” o N o w Match FAILS /No+w+/ w
  102. 102. Possessive Quantifers
  103. 103. Possessive quantifiers do not backtrack
  104. 104. Makes Quantifier Possessive /No++w+/
  105. 105. “Noooo” o N o w /No++w+/ w
  106. 106. “Noooo” o N o w /No++w+/ w
  107. 107. “Noooo” o Loops N o w /No++w+/ w
  108. 108. “Noooo” o Loops N o w /No++w+/ w
  109. 109. “Noooo” o Loops N o w /No++w+/ w
  110. 110. “Noooo” o N o w /No++w+/ w
  111. 111. “Noooo” o Loops N o Uh Oh w w Backtrack or Fail? /No++w+/
  112. 112. “Noooo” o N o w Match FAILS /No++w+/ w
  113. 113. Possessive quantifiers fail faster by controlling backtracking
  114. 114. Use possessive quantifers with caution
  115. 115. Tying It All Together Photo By Keith Ramos Creative Commons Attribution 2.0 Generic
  116. 116. snake_case to CamelCase
  117. 117. snake_case to CamelCase Find first letter of string and capitalize it
  118. 118. snake_case to CamelCase Find first letter of string and capitalize it Find any character that follows an underscore and capitalize it
  119. 119. snake_case to CamelCase Find first letter of string and capitalize it Find any character that follows an underscore and capitalize it Remove underscores
  120. 120. snake_case to CamelCase Find first letter of string and capitalize it
  121. 121. case_converter_spec.rb before(:each) do @case_converter = CaseConverter.new end it ʺ″capitalizes the first letterʺ″ do result = @case_converter .upcase_chars(ʺ″methodʺ″) result.should == ʺ″Methodʺ″ end
  122. 122. case_converter_spec.rb before(:each) do @case_converter = CaseConverter.new end it ʺ″capitalizes the first letterʺ″ do result = @case_converter .upcase_chars(ʺ″methodʺ″) result.should == ʺ″Methodʺ″ end
  123. 123. case_converter_spec.rb before(:each) do @case_converter = CaseConverter.new end it ʺ″capitalizes the first letterʺ″ do result = @case_converter .upcase_chars(ʺ″methodʺ″) result.should == ʺ″Methodʺ″ end
  124. 124. Anchors Match To Beginning Of String / A /
  125. 125. Matches Any Word Character / Aw/
  126. 126. case_converter.rb def upcase_chars(string) re = / A w/ string.gsub(re){|char| char.upcase} end
  127. 127. case_converter.rb def upcase_chars(string) re = / A w/ string.gsub(re){|char| char.upcase} end
  128. 128. case_converter.rb def upcase_chars(string) re = / A w/ string.gsub(re){|char| char.upcase} end Spec Passes!
  129. 129. case_converter_spec.rb it ʺ″capitalizes the first letterʺ″ do result = @case_converter .upcase_chars(ʺ″_methodʺ″) result.should == ʺ″_Methodʺ″ end
  130. 130. case_converter_spec.rb it ʺ″capitalizes the first letterʺ″ do result = @case_converter .upcase_chars(ʺ″_methodʺ″) result.should == ʺ″_Methodʺ″ end
  131. 131. case_converter_spec.rb it ʺ″capitalizes the first letterʺ″ do result = @case_converter .upcase_chars(ʺ″_methodʺ″) result.should == ʺ″_Methodʺ″ end Spec Fails!
  132. 132. Spec Failure: Expected: ʺ″_Methodʺ″ Got: ʺ″_methodʺ″
  133. 133. Problem: Matches Letters AND Underscores / Aw/
  134. 134. Matches Only Lowercase Letters / A[a-z]/
  135. 135. Matches an underscore / A _ [a-z]/
  136. 136. Makes underscore optional / A _ ?[a-z] /
  137. 137. case_converter.rb def upcase_chars(string) re = / A _? [a-z] / string.gsub(re){|char| char.upcase} end
  138. 138. case_converter.rb def upcase_chars(string) re = / A _? [a-z] / string.gsub(re){|char| char.upcase} end Spec Passes!
  139. 139. snake_case to CamelCase Find any character that follows an underscore and capitalize it
  140. 140. case_converter_spec.rb it ʺ″capitalizes letters after an underscoreʺ″ do result = @case_converter .upcase_chars(ʺ″some_methodʺ″) result.should == ʺ″Some_Methodʺ″ end
  141. 141. case_converter_spec.rb it ʺ″capitalizes letters after an underscoreʺ″ do result = @case_converter .upcase_chars(ʺ″some_methodʺ″) result.should == ʺ″Some_Methodʺ″ end
  142. 142. / A _ ?[a-z] /
  143. 143. Pipe For Alternation / A _ ?[a-z]|[a-z] /
  144. 144. Look Behind / A _ ?[a-z]|(?<=_)[a-z] /
  145. 145. case_converter.rb def upcase_chars(string) re = / A _ ?[a-z] | (?<=_)[a-z] / string.gsub(re){|char| char.upcase} end
  146. 146. case_converter.rb def upcase_chars(string) re = / A _ ?[a-z] | (?<=_)[a-z] / string.gsub(re){|char| char.upcase} end Spec Passes!
  147. 147. snake_case to CamelCase Remove underscores
  148. 148. case_converter_spec.rb it ʺ″removes underscoresʺ″ do result = @case_converter .rmv_underscores(ʺ″some_methodʺ″) result.should == ʺ″somemethodʺ″ end
  149. 149. case_converter_spec.rb it ʺ″removes underscoresʺ″ do result = @case_converter .rmv_underscores(ʺ″some_methodʺ″) result.should == ʺ″somemethodʺ″ end
  150. 150. case_converter_spec.rb it ʺ″removes underscoresʺ″ do result = @case_converter .rmv_underscores(ʺ″some_methodʺ″) result.should == ʺ″somemethodʺ″ end
  151. 151. Matches An Underscore /_ /
  152. 152. case_converter.rb def rmv_underscores(string) re = / _ / string.gsub(re, “”) end
  153. 153. case_converter.rb def rmv_underscores(string) re = / _ / string.gsub(re, “”) end
  154. 154. case_converter.rb def rmv_underscores(string) re = / _ / string.gsub(re, “”) end Spec Passes!
  155. 155. snake_case to CamelCase Combine results of two methods
  156. 156. case_converter_spec.rb it ʺ″converts snake_case to CamelCaseʺ″ do result = @case_converter .snake_to_camel(ʺ″some_methodʺ″) result.should == ʺ″SomeMethodʺ″ end
  157. 157. case_converter_spec.rb it ʺ″converts snake_case to CamelCaseʺ″ do result = @case_converter .snake_to_camel(ʺ″some_methodʺ″) result.should == ʺ″SomeMethodʺ″ end
  158. 158. case_converter_spec.rb it ʺ″converts snake_case to CamelCaseʺ″ do result = @case_converter .snake_to_camel(ʺ″some_methodʺ″) result.should == ʺ″SomeMethodʺ″ end
  159. 159. case_converter.rb def snake_to_camel(string) upcase_chars(string) end
  160. 160. case_converter.rb def snake_to_camel(string) rmv_underscores( upcase_chars(string) ) end
  161. 161. case_converter.rb def snake_to_camel(string) rmv_underscores( upcase_chars(string) ) end Spec Passes!
  162. 162. Code is available here: https://github.com/nellshamrell/ snake_to_camel_case
  163. 163. Conclusion Photo By Steve Jurvetson Creative Commons Attribution Generic 2.0
  164. 164. Develop regular expressions in small pieces
  165. 165. If you write code, you can write regular expressions
  166. 166. Move beyond the fear
  167. 167. Nell Shamrell Software Development Engineer Blue Box @nellshamrell Resources: https://gist.github.com/ nellshamrell/6031738 Photo By Leonardo Pallotta Creative Commons Attribution Generic 2.0

×