• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
And now you have two problems. Ruby regular expressions for fun and profit by Luca Mearelli
 

And now you have two problems. Ruby regular expressions for fun and profit by Luca Mearelli

on

  • 372 views

A wise hacker said: Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems. ...

A wise hacker said: Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.
Regular expressions are a powerful tool in our hands and a first class citizen in ruby so it is tempting to overuse them. But knowing them and using them properly is a fundamental asset of every developer.
We’ll see hands-on examples of proper Reg Exps usage in ruby code, we’ll also look at bad and ugly cases and learn how to approach writing, testing and debugging regular expressions.

Statistics

Views

Total Views
372
Views on SlideShare
328
Embed Views
44

Actions

Likes
0
Downloads
2
Comments
0

2 Embeds 44

http://librosweb.es 42
http://rome.codemotionworld.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    And now you have two problems. Ruby regular expressions for fun and profit by Luca Mearelli And now you have two problems. Ruby regular expressions for fun and profit by Luca Mearelli Presentation Transcript

    • And now you have two problemsRuby regular expressions for fun and profit Luca Mearelli @lmea Codemotion Rome - 2013
    • Regular expressionspatterns to describe the contents of a text•cat catch indicate ...•2013-03-22, YYYY-MM-DD, ...•$ 12,500.80 @lmea
    • Regexps: good for...Pattern matchingSearch and replace @lmea
    • Regexp in rubyRegexp object: Regexp.new("cat")literal notation #1: %r{cat}literal notation #2: /cat/ @lmea
    • Regexp syntaxliterals: /cat/ matches any ‘cat’ substringthe dot: /./ matches any charactercharacter classes: /[aeiou]/ /[a-z]/ /[01]/negated character classes: /[^abc]/ @lmea
    • Regexp syntax Modifierscase insensitive: /./ionly interpolate #{} blocks once: /./omultiline mode - . will match newline: /./mextended mode - whitespace is ignored: /./x @lmea
    • Regexp syntax Shorthand classes/d/ digit /D/ non digit/s/ whitespace /S/ non whitespace/w/ word character /W/ non word character/h/ hexdigit /H/ non hexdigit @lmea
    • Regexp syntax Anchors/^/ beginning of line /$/ end of line/b/ word boundary /B/ non word boundary/A/ beginning of string /z/ end of string end of string. If string ends with a newline, /Z/ it matches just before newline @lmea
    • Regexp syntaxalternation: /cat|dog/ matches ‘cats and dogs’0-or-more: /ab*/ matches ‘a’ ‘ab’ ‘abb’...1-or-more: /ab+/ matches ‘ab’ ‘abb’ ...given-number: /ab{2}/ matches ‘abb’ but not‘ab’ or the whole ‘abbb’ string @lmea
    • Regexp syntaxgreedy matches: /.+cat/ matches ‘the cat iscatching a mouse’lazy matches: /.+?scat/ matches ‘the cat iscatching a mouse’ @lmea
    • Regexp syntaxgrouping: /(d{3}.){3}d{3}/ matches IP-like stringscapturing: /a (cat|dog)/ the match iscaptured in $1 to be used laternon capturing: /a (?:cat|dog)/ no contentcapturedatomic grouping: /(?>a+)/ doesn’t backtrack @lmea
    • String substitution "My cat eats catfood".sub(/cat/, "dog")# => My dog eats catfood"My cat eats catfood".gsub(/cat/, "dog")# => My dog eats dogfood"My cat eats catfood".gsub(/bcat(w+)/, "dog1")# => My cat eats dogfood"My cat eats catfood".gsub(/bcat(w+)/){|m| $1.reverse}# => My cat eats doof @lmea
    • String parsing "Codemotion Rome: Mar 20 to Mar 23".scan(/w{3} d{1,2}/)# => ["Mar 20", "Mar 23"]"Codemotion Rome: Mar 20 to Mar 23".scan(/(w{3}) (d{1,2})/)# => [["Mar", "20"], ["Mar", "23"]]"Codemotion Rome: Mar 20 to Mar 23".scan(/(w{3}) (d{1,2})/){|a,b| puts b+"/"+a}# 20/Mar# 23/Mar# => "Codemotion Rome: Mar 20 to Mar 23" @lmea
    • Regexp methodsif "what a wonderful world" =~ /(world)/ puts "hello #{$1.upcase}"end# hello WORLDif /(world)/.match("The world") puts "hello #{$1.upcase}"end# hello WORLDmatch_data = /(world)/.match("The world")puts "hello #{match_data[1].upcase}"# hello WORLD @lmea
    • Rails app examples# in routingmatch path/:id, :constraints => { :id => /[A-Z]d{5}/ }# in validationsvalidates :phone, :format => /Ad{2,4}s*d+z/validates :phone, :format => { :with=> /Ad{2,4}s*d+z/ }validates :phone, :format => { :without=> /A02s*d+z/ } @lmea
    • Rails examples# in ActiveModel::Validations::NumericalityValidatordef parse_raw_value_as_an_integer(raw_value) raw_value.to_i if raw_value.to_s =~ /A[+-]?d+Z/end# in ActionDispatch::RemoteIp::IpSpoofAttackError# IP addresses that are "trusted proxies" that can be stripped from# the comma-delimited list in the X-Forwarded-For header. See also:# http://en.wikipedia.org/wiki/Private_network#Private_IPv4_address_spacesTRUSTED_PROXIES = %r{ ^127.0.0.1$ | # localhost ^(10 | # private IP 10.x.x.x 172.(1[6-9]|2[0-9]|3[0-1]) | # private IP in the range 172.16.0.0 .. 172.31.255.255 192.168 # private IP 192.168.x.x ).}xWILDCARD_PATH = %r{*([^/)]+))?$} @lmea
    • Regexps are dangerous"If I was going to place a bet on somethingabout Rails security, itd be that there are moreregex vulnerabilities in the tree. I amuncomfortable with how much Rails leans onregex for policy decisions."Thomas H. Ptacek (Founder @ Matasano, Feb 2013) @lmea
    • Tip #1Beware of nested quantifiers/(x+x+)+y/ =~ xxxxxxxxxy/(xx+)+y/ =~ xxxxxxxxxx/(?>x+x+)+y/ =~ xxxxxxxxx @lmea
    • Tip #2Don’t make everything optional/[-+]?[0-9]*.?[0-9]*/ =~ ./[-+]?([0-9]*.?[0-9]+|[0-9]+)//[-+]?[0-9]*.?[0-9]+/ @lmea
    • Tip #3Evaluate tradeoffs/(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t] .[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[)+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?: ]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:( [] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[]t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-0 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)* |.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 0](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+ 00-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?: .|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z ;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn) :[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?: (?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ []]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn) ^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t] ]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*()*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ ?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;: t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])* ".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:()(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t] ?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[)+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*) ["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+ ])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:r ])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?n)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?: :.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031 [^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*]( ]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(? ?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[[":(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(? ()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn):rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 000-031]+(?:(? ?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)? @,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[[ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]| ;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t].|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<> )*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|" ".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t] (?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".)*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;: []]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(? rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[[:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[ "()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000- *))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|( +|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,; .(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([ |(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:" ?:rn)?[ t])*))*)?;s*)//b[A-Z0-9._%+-]+@(?:[A-Z0-9-]+.)+[A-Z]{2,4}b/ @lmea
    • Tip #4Capture repeated groups and don’trepeat a captured group/!(abc|123)+!/ =~ !abc123!# $1 == 123/!((abc|123)+)!/ =~ !abc123!# $1 == abc123 @lmea
    • Tip #5use interpolation with carestr = "cat"/#{str}/ =~ "My cat eats catfood"/#{Regexp.quote(str)}/ =~ "My cat eats catfood" @lmea
    • Tip #6Don’t use ^ and $ to match thestrings beginning and endvalidates :url, :format => /^https?/"http://example.com" =~ /^https?/"javascript:alert(hello!);%0Ahttp://example.com""javascript:alert(hello!);nhttp://example.com" =~ /^https?/"javascript:alert(hello!);nhttp://example.com" =~ /Ahttps?/ @lmea
    • From 060bb7250b963609a0d8a5d0559e36b99d2402c6 Mon Sep 17 00:00:00 2001From: joernchen of Phenoelit <joernchen@phenoelit.de>Date: Sat, 9 Feb 2013 15:46:44 -0800Subject: [PATCH] Fix issue with attr_protected where malformed input could circumvent protectionFixes: CVE-2013-0276--- activemodel/lib/active_model/attribute_methods.rb | 2 +- activemodel/lib/active_model/mass_assignment_security/permission_set.rb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)diff --git a/activemodel/lib/active_model/attribute_methods.rb b/activemodel/lib/active_model/attribute_methods.rbindex f033a94..96f2c82 100644--- a/activemodel/lib/active_model/attribute_methods.rb+++ b/activemodel/lib/active_model/attribute_methods.rb@@ -365,7 +365,7 @@ module ActiveModel end @prefix, @suffix = options[:prefix] || , options[:suffix] || - @regex = /^(#{Regexp.escape(@prefix)})(.+?)(#{Regexp.escape(@suffix)})$/+ @regex = /A(#{Regexp.escape(@prefix)})(.+?)(#{Regexp.escape(@suffix)})z/ @method_missing_target = "#{@prefix}attribute#{@suffix}" @method_name = "#{prefix}%s#{suffix}" enddiff --git a/activemodel/lib/active_model/mass_assignment_security/permission_set.rb b/activemodel/lib/active_model/mass_assignment_security/permission_set.rbindex a1fcdf1..10faa29 100644--- a/activemodel/lib/active_model/mass_assignment_security/permission_set.rb+++ b/activemodel/lib/active_model/mass_assignment_security/permission_set.rb@@ -19,7 +19,7 @@ module ActiveModel protected def remove_multiparameter_id(key)- key.to_s.gsub(/(.+/, )+ key.to_s.gsub(/(.+/m, ) end end--1.8.1.1 @lmea
    • From 99123ad12f71ce3e7fe70656810e53133665527c Mon Sep 17 00:00:00 2001From: Aaron Patterson <aaron.patterson@gmail.com>Date: Fri, 15 Mar 2013 15:04:00 -0700Subject: [PATCH] fix protocol checking in sanitization [CVE-2013-1857]Conflicts: actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rb--- .../action_controller/vendor/html-scanner/html/sanitizer.rb | 4 ++-- actionpack/test/template/html-scanner/sanitizer_test.rb | 10 ++++++++++ 2 files changed, 12 insertions(+), 2 deletions(-)diff --git a/actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rb b/actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rbindex 02eea58..994e115 100644--- a/actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rb+++ b/actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rb@@ -66,7 +66,7 @@ module HTML # A regular expression of the valid characters used to separate protocols like # the : in http://foo.com- self.protocol_separator = /:|(&#0*58)|(&#x70)|(%|&#37;)3A/+ self.protocol_separator = /:|(&#0*58)|(&#x70)|(&#x0*3a)|(%|&#37;)3A/i # Specifies a Set of HTML attributes that can have URIs. self.uri_attributes = Set.new(%w(href src cite action longdesc xlink:href lowsrc))@@ -171,7 +171,7 @@ module HTML def contains_bad_protocols?(attr_name, value) uri_attributes.include?(attr_name) &&- (value =~ /(^[^/:]*):|(&#0*58)|(&#x70)|(%|&#37;)3A/ && !allowed_protocols.include?(value.split(protocol_separator).first.downcase))+ (value =~ /(^[^/:]*):|(&#0*58)|(&#x70)|(&#x0*3a)|(%|&#37;)3A/i && !allowed_protocols.include?(value.split(protocol_separator).first.downcase.strip)) end end enddiff --git a/actionpack/test/template/html-scanner/sanitizer_test.rb b/actionpack/test/template/html-scanner/sanitizer_test.rbindex 4e2ad4e..dee60c9 100644--- a/actionpack/test/template/html-scanner/sanitizer_test.rb+++ b/actionpack/test/template/html-scanner/sanitizer_test.rb@@ -176,6 +176,7 @@ class SanitizerTest < ActionController::TestCase %(<IMG SRC="jav&#x0A;ascript:alert(XSS);">), @lmea
    • ToolsPrint a cheatsheet!Info:http://www.regular-expressions.infoDebug:http://rubular.comhttp://rubyxp.comVisualize:http://www.regexper.com/ @lmea
    • Thank you!