Your SlideShare is downloading. ×
And Now You Have Two Problems
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

And Now You Have Two Problems

208
views

Published on

A wise hacker said: Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. …

A wise hacker said: Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Regular expressions are a powerful tool in our hands and a first class citizen in ruby so it is tempting to overuse them. But knowing them and using them properly is a fundamental asset of every developer. We'll see hands-on examples of proper Reg Exps usage in ruby code, we'll look at bad and ugly cases, and learn how to approach writing and debugging them.

Published in: Lifestyle, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
208
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. And now you havetwo problemsRuby regular expressions for fun and profitLuca Mearelli @lmeaCodemotion Rome - 2013
  • 2. @lmeaRegular expressions•cat catch indicate ...•2013-03-22, YYYY-MM-DD, ...•$ 12,500.80patterns to describe the contents of a text
  • 3. @lmeaRegexps: good for...Pattern matchingSearch and replace
  • 4. @lmeaRegexp in rubyRegexp object: Regexp.new("cat")literal notation #1: %r{cat}literal notation #2: /cat/
  • 5. @lmeaRegexp syntaxliterals: /cat/ matches any ‘cat’ substringthe dot: /./ matches any charactercharacter classes: /[aeiou]/ /[a-z]/ /[01]/negated character classes: /[^abc]/
  • 6. @lmeaRegexp syntaxcase insensitive: /./ionly interpolate #{} blocks once: /./omultiline mode - . will match newline: /./mextended mode - whitespace is ignored: /./xModifiers
  • 7. @lmeaRegexp syntax/d/ digit /D/ non digit/s/ whitespace /S/ non whitespace/w/ word character /W/ non word character/h/ hexdigit /H/ non hexdigitShorthand classes
  • 8. @lmeaRegexp syntax/^/ beginning of line /$/ end of line/b/ word boundary /B/ non word boundary/A/ beginning of string /z/ end of string/Z/end of string. If stringends with a newline,it matches justbefore newlineAnchors
  • 9. @lmeaRegexp syntaxalternation: /cat|dog/ matches ‘cats and dogs’0-or-more: /ab*/ matches ‘a’ ‘ab’ ‘abb’...1-or-more: /ab+/ matches ‘ab’ ‘abb’ ...given-number: /ab{2}/ matches ‘abb’ but not‘ab’ or the whole ‘abbb’ string
  • 10. @lmeaRegexp syntaxgreedy matches: /.+cat/ matches ‘the cat iscatching a mouse’lazy matches: /.+?scat/ matches ‘the cat iscatching a mouse’
  • 11. @lmeaRegexp syntaxgrouping: /(d{3}.){3}d{3}/ matches IP-like stringscapturing: /a (cat|dog)/ the match iscaptured in $1 to be used laternon capturing: /a (?:cat|dog)/ no contentcapturedatomic grouping: /(?>a+)/ doesn’t backtrack
  • 12. @lmeaString substitution"My cat eats catfood".sub(/cat/, "dog")# => My dog eats catfood"My cat eats catfood".gsub(/cat/, "dog")# => My dog eats dogfood"My cat eats catfood".gsub(/bcat(w+)/, "dog1")# => My cat eats dogfood"My cat eats catfood".gsub(/bcat(w+)/){|m| $1.reverse}# => My cat eats doof
  • 13. @lmeaString parsing"Codemotion Rome: Mar 20 to Mar 23".scan(/w{3} d{1,2}/)# => ["Mar 20", "Mar 23"]"Codemotion Rome: Mar 20 to Mar 23".scan(/(w{3}) (d{1,2})/)# => [["Mar", "20"], ["Mar", "23"]]"Codemotion Rome: Mar 20 to Mar 23".scan(/(w{3}) (d{1,2})/){|a,b| puts b+"/"+a}# 20/Mar# 23/Mar# => "Codemotion Rome: Mar 20 to Mar 23"
  • 14. @lmeaRegexp methodsif "what a wonderful world" =~ /(world)/puts "hello #{$1.upcase}"end# hello WORLDif /(world)/.match("The world")puts "hello #{$1.upcase}"end# hello WORLDmatch_data = /(world)/.match("The world")puts "hello #{match_data[1].upcase}"# hello WORLD
  • 15. @lmeaRails app examples# in routingmatch path/:id, :constraints => { :id => /[A-Z]d{5}/ }# in validationsvalidates :phone, :format => /Ad{2,4}s*d+z/validates :phone, :format => { :with=> /Ad{2,4}s*d+z/ }validates :phone, :format => { :without=> /A02s*d+z/ }
  • 16. @lmeaRails examples# in ActiveModel::Validations::NumericalityValidatordef parse_raw_value_as_an_integer(raw_value)raw_value.to_i if raw_value.to_s =~ /A[+-]?d+Z/end# in ActionDispatch::RemoteIp::IpSpoofAttackError# IP addresses that are "trusted proxies" that can be stripped from# the comma-delimited list in the X-Forwarded-For header. See also:# http://en.wikipedia.org/wiki/Private_network#Private_IPv4_address_spacesTRUSTED_PROXIES = %r{^127.0.0.1$ | # localhost^(10 | # private IP 10.x.x.x172.(1[6-9]|2[0-9]|3[0-1]) | # private IP in the range 172.16.0.0 .. 172.31.255.255192.168 # private IP 192.168.x.x).}xWILDCARD_PATH = %r{*([^/)]+))?$}
  • 17. @lmeaRegexps aredangerous"If I was going to place a bet on somethingabout Rails security, itd be that there are moreregex vulnerabilities in the tree. I amuncomfortable with how much Rails leans onregex for policy decisions."Thomas H. Ptacek (Founder @ Matasano, Feb 2013)
  • 18. @lmeaTip #1Beware of nested quantifiers/(x+x+)+y/ =~ xxxxxxxxxy/(xx+)+y/ =~ xxxxxxxxxx/(?>x+x+)+y/ =~ xxxxxxxxx
  • 19. @lmeaTip #2Don’t make everything optional/[-+]?[0-9]*.?[0-9]*/ =~ ./[-+]?([0-9]*.?[0-9]+|[0-9]+)//[-+]?[0-9]*.?[0-9]+/
  • 20. @lmeaTip #3Evaluate tradeoffs/b[A-Z0-9._%+-]+@(?:[A-Z0-9-]+.)+[A-Z]{2,4}b//(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[]000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[]000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*))*)?;s*)/
  • 21. @lmeaTip #4Capture repeated groups and don’trepeat a captured group/!(abc|123)+!/ =~ !abc123!# $1 == 123/!((abc|123)+)!/ =~ !abc123!# $1 == abc123
  • 22. @lmeaTip #5use interpolation with carestr = "cat"/#{str}/ =~ "My cat eats catfood"/#{Regexp.quote(str)}/ =~ "My cat eats catfood"
  • 23. @lmeaTip #6Don’t use ^ and $ to match thestrings beginning and endvalidates :url, :format => /^https?/"http://example.com" =~ /^https?/"javascript:alert(hello!);%0Ahttp://example.com""javascript:alert(hello!);nhttp://example.com" =~ /^https?/"javascript:alert(hello!);nhttp://example.com" =~ /Ahttps?/
  • 24. @lmeaFrom 060bb7250b963609a0d8a5d0559e36b99d2402c6 Mon Sep 17 00:00:00 2001From: joernchen of Phenoelit <joernchen@phenoelit.de>Date: Sat, 9 Feb 2013 15:46:44 -0800Subject: [PATCH] Fix issue with attr_protected where malformed input couldcircumvent protectionFixes: CVE-2013-0276---activemodel/lib/active_model/attribute_methods.rb | 2 +-activemodel/lib/active_model/mass_assignment_security/permission_set.rb | 2 +-2 files changed, 2 insertions(+), 2 deletions(-)diff --git a/activemodel/lib/active_model/attribute_methods.rb b/activemodel/lib/active_model/attribute_methods.rbindex f033a94..96f2c82 100644--- a/activemodel/lib/active_model/attribute_methods.rb+++ b/activemodel/lib/active_model/attribute_methods.rb@@ -365,7 +365,7 @@ module ActiveModelend@prefix, @suffix = options[:prefix] || , options[:suffix] || - @regex = /^(#{Regexp.escape(@prefix)})(.+?)(#{Regexp.escape(@suffix)})$/+ @regex = /A(#{Regexp.escape(@prefix)})(.+?)(#{Regexp.escape(@suffix)})z/@method_missing_target = "#{@prefix}attribute#{@suffix}"@method_name = "#{prefix}%s#{suffix}"enddiff --git a/activemodel/lib/active_model/mass_assignment_security/permission_set.rb b/activemodel/lib/active_model/mass_assignment_security/permission_set.rbindex a1fcdf1..10faa29 100644--- a/activemodel/lib/active_model/mass_assignment_security/permission_set.rb+++ b/activemodel/lib/active_model/mass_assignment_security/permission_set.rb@@ -19,7 +19,7 @@ module ActiveModelprotecteddef remove_multiparameter_id(key)- key.to_s.gsub(/(.+/, )+ key.to_s.gsub(/(.+/m, )endend--1.8.1.1
  • 25. @lmeaFrom 99123ad12f71ce3e7fe70656810e53133665527c Mon Sep 17 00:00:00 2001From: Aaron Patterson <aaron.patterson@gmail.com>Date: Fri, 15 Mar 2013 15:04:00 -0700Subject: [PATCH] fix protocol checking in sanitization [CVE-2013-1857]Conflicts:actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rb---.../action_controller/vendor/html-scanner/html/sanitizer.rb | 4 ++--actionpack/test/template/html-scanner/sanitizer_test.rb | 10 ++++++++++2 files changed, 12 insertions(+), 2 deletions(-)diff --git a/actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rb b/actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rbindex 02eea58..994e115 100644--- a/actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rb+++ b/actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rb@@ -66,7 +66,7 @@ module HTML# A regular expression of the valid characters used to separate protocols like# the : in http://foo.com- self.protocol_separator = /:|(&#0*58)|(&#x70)|(%|&#37;)3A/+ self.protocol_separator = /:|(&#0*58)|(&#x70)|(&#x0*3a)|(%|&#37;)3A/i# Specifies a Set of HTML attributes that can have URIs.self.uri_attributes = Set.new(%w(href src cite action longdesc xlink:href lowsrc))@@ -171,7 +171,7 @@ module HTMLdef contains_bad_protocols?(attr_name, value)uri_attributes.include?(attr_name) &&- (value =~ /(^[^/:]*):|(&#0*58)|(&#x70)|(%|&#37;)3A/ && !allowed_protocols.include?(value.split(protocol_separator).first.downcase))+ (value =~ /(^[^/:]*):|(&#0*58)|(&#x70)|(&#x0*3a)|(%|&#37;)3A/i && !allowed_protocols.include?(value.split(protocol_separator).first.downcase.strip))endendenddiff --git a/actionpack/test/template/html-scanner/sanitizer_test.rb b/actionpack/test/template/html-scanner/sanitizer_test.rbindex 4e2ad4e..dee60c9 100644--- a/actionpack/test/template/html-scanner/sanitizer_test.rb+++ b/actionpack/test/template/html-scanner/sanitizer_test.rb@@ -176,6 +176,7 @@ class SanitizerTest < ActionController::TestCase%(<IMG SRC="jav&#x0A;ascript:alert(XSS);">),
  • 26. @lmeaToolsPrint a cheatsheet!Info:http://www.regular-expressions.infoDebug:http://rubular.comhttp://rubyxp.comVisualize:http://www.regexper.com/
  • 27. Thank you!