And Now You Have Two Problems

  • 189 views
Uploaded on

A wise hacker said: Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. …

A wise hacker said: Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Regular expressions are a powerful tool in our hands and a first class citizen in ruby so it is tempting to overuse them. But knowing them and using them properly is a fundamental asset of every developer. We'll see hands-on examples of proper Reg Exps usage in ruby code, we'll look at bad and ugly cases, and learn how to approach writing and debugging them.

More in: Lifestyle , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
189
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. And now you havetwo problemsRuby regular expressions for fun and profitLuca Mearelli @lmeaCodemotion Rome - 2013
  • 2. @lmeaRegular expressions•cat catch indicate ...•2013-03-22, YYYY-MM-DD, ...•$ 12,500.80patterns to describe the contents of a text
  • 3. @lmeaRegexps: good for...Pattern matchingSearch and replace
  • 4. @lmeaRegexp in rubyRegexp object: Regexp.new("cat")literal notation #1: %r{cat}literal notation #2: /cat/
  • 5. @lmeaRegexp syntaxliterals: /cat/ matches any ‘cat’ substringthe dot: /./ matches any charactercharacter classes: /[aeiou]/ /[a-z]/ /[01]/negated character classes: /[^abc]/
  • 6. @lmeaRegexp syntaxcase insensitive: /./ionly interpolate #{} blocks once: /./omultiline mode - . will match newline: /./mextended mode - whitespace is ignored: /./xModifiers
  • 7. @lmeaRegexp syntax/d/ digit /D/ non digit/s/ whitespace /S/ non whitespace/w/ word character /W/ non word character/h/ hexdigit /H/ non hexdigitShorthand classes
  • 8. @lmeaRegexp syntax/^/ beginning of line /$/ end of line/b/ word boundary /B/ non word boundary/A/ beginning of string /z/ end of string/Z/end of string. If stringends with a newline,it matches justbefore newlineAnchors
  • 9. @lmeaRegexp syntaxalternation: /cat|dog/ matches ‘cats and dogs’0-or-more: /ab*/ matches ‘a’ ‘ab’ ‘abb’...1-or-more: /ab+/ matches ‘ab’ ‘abb’ ...given-number: /ab{2}/ matches ‘abb’ but not‘ab’ or the whole ‘abbb’ string
  • 10. @lmeaRegexp syntaxgreedy matches: /.+cat/ matches ‘the cat iscatching a mouse’lazy matches: /.+?scat/ matches ‘the cat iscatching a mouse’
  • 11. @lmeaRegexp syntaxgrouping: /(d{3}.){3}d{3}/ matches IP-like stringscapturing: /a (cat|dog)/ the match iscaptured in $1 to be used laternon capturing: /a (?:cat|dog)/ no contentcapturedatomic grouping: /(?>a+)/ doesn’t backtrack
  • 12. @lmeaString substitution"My cat eats catfood".sub(/cat/, "dog")# => My dog eats catfood"My cat eats catfood".gsub(/cat/, "dog")# => My dog eats dogfood"My cat eats catfood".gsub(/bcat(w+)/, "dog1")# => My cat eats dogfood"My cat eats catfood".gsub(/bcat(w+)/){|m| $1.reverse}# => My cat eats doof
  • 13. @lmeaString parsing"Codemotion Rome: Mar 20 to Mar 23".scan(/w{3} d{1,2}/)# => ["Mar 20", "Mar 23"]"Codemotion Rome: Mar 20 to Mar 23".scan(/(w{3}) (d{1,2})/)# => [["Mar", "20"], ["Mar", "23"]]"Codemotion Rome: Mar 20 to Mar 23".scan(/(w{3}) (d{1,2})/){|a,b| puts b+"/"+a}# 20/Mar# 23/Mar# => "Codemotion Rome: Mar 20 to Mar 23"
  • 14. @lmeaRegexp methodsif "what a wonderful world" =~ /(world)/puts "hello #{$1.upcase}"end# hello WORLDif /(world)/.match("The world")puts "hello #{$1.upcase}"end# hello WORLDmatch_data = /(world)/.match("The world")puts "hello #{match_data[1].upcase}"# hello WORLD
  • 15. @lmeaRails app examples# in routingmatch path/:id, :constraints => { :id => /[A-Z]d{5}/ }# in validationsvalidates :phone, :format => /Ad{2,4}s*d+z/validates :phone, :format => { :with=> /Ad{2,4}s*d+z/ }validates :phone, :format => { :without=> /A02s*d+z/ }
  • 16. @lmeaRails examples# in ActiveModel::Validations::NumericalityValidatordef parse_raw_value_as_an_integer(raw_value)raw_value.to_i if raw_value.to_s =~ /A[+-]?d+Z/end# in ActionDispatch::RemoteIp::IpSpoofAttackError# IP addresses that are "trusted proxies" that can be stripped from# the comma-delimited list in the X-Forwarded-For header. See also:# http://en.wikipedia.org/wiki/Private_network#Private_IPv4_address_spacesTRUSTED_PROXIES = %r{^127.0.0.1$ | # localhost^(10 | # private IP 10.x.x.x172.(1[6-9]|2[0-9]|3[0-1]) | # private IP in the range 172.16.0.0 .. 172.31.255.255192.168 # private IP 192.168.x.x).}xWILDCARD_PATH = %r{*([^/)]+))?$}
  • 17. @lmeaRegexps aredangerous"If I was going to place a bet on somethingabout Rails security, itd be that there are moreregex vulnerabilities in the tree. I amuncomfortable with how much Rails leans onregex for policy decisions."Thomas H. Ptacek (Founder @ Matasano, Feb 2013)
  • 18. @lmeaTip #1Beware of nested quantifiers/(x+x+)+y/ =~ xxxxxxxxxy/(xx+)+y/ =~ xxxxxxxxxx/(?>x+x+)+y/ =~ xxxxxxxxx
  • 19. @lmeaTip #2Don’t make everything optional/[-+]?[0-9]*.?[0-9]*/ =~ ./[-+]?([0-9]*.?[0-9]+|[0-9]+)//[-+]?[0-9]*.?[0-9]+/
  • 20. @lmeaTip #3Evaluate tradeoffs/b[A-Z0-9._%+-]+@(?:[A-Z0-9-]+.)+[A-Z]{2,4}b//(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[]000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[]000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*))*)?;s*)/
  • 21. @lmeaTip #4Capture repeated groups and don’trepeat a captured group/!(abc|123)+!/ =~ !abc123!# $1 == 123/!((abc|123)+)!/ =~ !abc123!# $1 == abc123
  • 22. @lmeaTip #5use interpolation with carestr = "cat"/#{str}/ =~ "My cat eats catfood"/#{Regexp.quote(str)}/ =~ "My cat eats catfood"
  • 23. @lmeaTip #6Don’t use ^ and $ to match thestrings beginning and endvalidates :url, :format => /^https?/"http://example.com" =~ /^https?/"javascript:alert(hello!);%0Ahttp://example.com""javascript:alert(hello!);nhttp://example.com" =~ /^https?/"javascript:alert(hello!);nhttp://example.com" =~ /Ahttps?/
  • 24. @lmeaFrom 060bb7250b963609a0d8a5d0559e36b99d2402c6 Mon Sep 17 00:00:00 2001From: joernchen of Phenoelit <joernchen@phenoelit.de>Date: Sat, 9 Feb 2013 15:46:44 -0800Subject: [PATCH] Fix issue with attr_protected where malformed input couldcircumvent protectionFixes: CVE-2013-0276---activemodel/lib/active_model/attribute_methods.rb | 2 +-activemodel/lib/active_model/mass_assignment_security/permission_set.rb | 2 +-2 files changed, 2 insertions(+), 2 deletions(-)diff --git a/activemodel/lib/active_model/attribute_methods.rb b/activemodel/lib/active_model/attribute_methods.rbindex f033a94..96f2c82 100644--- a/activemodel/lib/active_model/attribute_methods.rb+++ b/activemodel/lib/active_model/attribute_methods.rb@@ -365,7 +365,7 @@ module ActiveModelend@prefix, @suffix = options[:prefix] || , options[:suffix] || - @regex = /^(#{Regexp.escape(@prefix)})(.+?)(#{Regexp.escape(@suffix)})$/+ @regex = /A(#{Regexp.escape(@prefix)})(.+?)(#{Regexp.escape(@suffix)})z/@method_missing_target = "#{@prefix}attribute#{@suffix}"@method_name = "#{prefix}%s#{suffix}"enddiff --git a/activemodel/lib/active_model/mass_assignment_security/permission_set.rb b/activemodel/lib/active_model/mass_assignment_security/permission_set.rbindex a1fcdf1..10faa29 100644--- a/activemodel/lib/active_model/mass_assignment_security/permission_set.rb+++ b/activemodel/lib/active_model/mass_assignment_security/permission_set.rb@@ -19,7 +19,7 @@ module ActiveModelprotecteddef remove_multiparameter_id(key)- key.to_s.gsub(/(.+/, )+ key.to_s.gsub(/(.+/m, )endend--1.8.1.1
  • 25. @lmeaFrom 99123ad12f71ce3e7fe70656810e53133665527c Mon Sep 17 00:00:00 2001From: Aaron Patterson <aaron.patterson@gmail.com>Date: Fri, 15 Mar 2013 15:04:00 -0700Subject: [PATCH] fix protocol checking in sanitization [CVE-2013-1857]Conflicts:actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rb---.../action_controller/vendor/html-scanner/html/sanitizer.rb | 4 ++--actionpack/test/template/html-scanner/sanitizer_test.rb | 10 ++++++++++2 files changed, 12 insertions(+), 2 deletions(-)diff --git a/actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rb b/actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rbindex 02eea58..994e115 100644--- a/actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rb+++ b/actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rb@@ -66,7 +66,7 @@ module HTML# A regular expression of the valid characters used to separate protocols like# the : in http://foo.com- self.protocol_separator = /:|(&#0*58)|(&#x70)|(%|&#37;)3A/+ self.protocol_separator = /:|(&#0*58)|(&#x70)|(&#x0*3a)|(%|&#37;)3A/i# Specifies a Set of HTML attributes that can have URIs.self.uri_attributes = Set.new(%w(href src cite action longdesc xlink:href lowsrc))@@ -171,7 +171,7 @@ module HTMLdef contains_bad_protocols?(attr_name, value)uri_attributes.include?(attr_name) &&- (value =~ /(^[^/:]*):|(&#0*58)|(&#x70)|(%|&#37;)3A/ && !allowed_protocols.include?(value.split(protocol_separator).first.downcase))+ (value =~ /(^[^/:]*):|(&#0*58)|(&#x70)|(&#x0*3a)|(%|&#37;)3A/i && !allowed_protocols.include?(value.split(protocol_separator).first.downcase.strip))endendenddiff --git a/actionpack/test/template/html-scanner/sanitizer_test.rb b/actionpack/test/template/html-scanner/sanitizer_test.rbindex 4e2ad4e..dee60c9 100644--- a/actionpack/test/template/html-scanner/sanitizer_test.rb+++ b/actionpack/test/template/html-scanner/sanitizer_test.rb@@ -176,6 +176,7 @@ class SanitizerTest < ActionController::TestCase%(<IMG SRC="jav&#x0A;ascript:alert(XSS);">),
  • 26. @lmeaToolsPrint a cheatsheet!Info:http://www.regular-expressions.infoDebug:http://rubular.comhttp://rubyxp.comVisualize:http://www.regexper.com/
  • 27. Thank you!