RegEx 101

Todd Benson
Overview

•
•
•
•

What is RegEx
RegEx Basics
Uses for RegEx
Useful RegExpressions
What is RegEx?

“In computing, a regular
expression (abbreviated regex or regexp) is a
sequence of characters that forms a search
pattern, mainly for use in pattern
matching with strings, or string matching, i.e.
"find and replace"-like operations. “ - Wikipedia
• “Some people, when confronted with a
problem, think ‘I know, I'll use regular
expressions.’ Now they have two problems.” Jamie Zawinski
Why RegEx?

• Tools use it: Nessus, Burp, W3AF
• All programming languages use it
• Excellent tool to have in the toolbox
RegEx Basics: Literal Matches

Literal Matches
‘bat’ matches ‘bat’

12 special characters -  ^ $ . | ? * + ( ) [ ]
These must be escaped ‘’ ‘$’

.
‘.at’ Matches ‘bat’, ‘cat’, and ‘hat’
RegEx Basics: Characture Classes

Character Classes
• -- [ ]
‘[bc]at’ will match ‘bat’ or ‘cat’

• --[^ ]
[^A-Z] will match any character that is not a capitol
letter
RegEx Basics: Shorthand Character Classes

Shorthand Character Classes
• d
Same as [0-9]

• D
Same as [^0-9]

• w
Same as [0-9A-Za-z_]

• W
Same as [^0-9A-Za-z_]

• s
tab, line feed, form feed, carriage return, and space

• S
Anything other than tab, line feed, etc.
RegEx Basics: Anchors

Anchors
• ^
Beginning of line
‘rpm -qa|grep ^ao’ would list all packages that start with
‘ao’

• $
End of line
‘[0-9][0-9][0-9]$’ would find all instances when a line
ended with 3 consecutive digits

• b b
Word boundary
‘bW.n*b’ looks for words that begin with ‘W’ followed by
any character followed by ‘n’ followed by zero or more
characters
‘Win’ ‘Windows’ ‘Won’ ‘Wonton’ ‘Winter’
‘Wonderland’ ‘Wonder’ all match
RegEx Basics: Non-Printable

Non-printable
• -- n
New Line

• -- r
Carriage Return
RegEx Basics: Groups

Groups
• --( )
Defines the scope and precedence of operators
‘Write(ln)?’ matches ‘Write’ and
‘Writeln’

• -- |
OR
‘Gr(a|e)y’ matches ‘Gray’ and ‘Grey’
‘(ITSO|OITS)’ matches ‘ITSO’ or ‘OITS’
RegEx Basics: Quantification

Quantification
Shows how often a token or group is allowed to
occur
• ?
Zero or one
‘a?’ will match ‘’ and ‘a’

• *
Zero or more
‘a*’ will match ‘’ and ‘a’ and ‘aaaaaaaaa’
RegEx Basics: Quantification (Cont.)

Quantification
Shows how often a token or group is allowed to
occur
• +
One or more
‘a+’ will match ‘a’ and ‘aaaaaaaaaaaa’

• {,}
Minimum and Maximum
‘a{3,7}’ will match between 3 and 7 ‘a’
Uses: Searches

• Errors
(error|exception|illegal|invalid|fail|stack|access|direc
tory|file|not
found|unknown|uid=|varchar|SQL|quotation
mark|syntax|password)
• Redirects
(document|window).
Uses: Searches (Cont.)

• DOM XSS
((src|href|data|location|code|value|action)s*["']]*
s*+?s*=)|((replace|assign|navigate|getResponseHea
der|open(Dialog)?|showModalDialog|eval|evaluate|e
xecCommand|execScript|setTimeout|setInterval)s*["'
]]*s*()
• DOM XSS
(locations*[[.])|([.[]s*["']?s*(arguments|dialogArg
uments|innerHTML|write(ln)?|open(Dialog)?|showMo
dalDialog|cookie|URL|documentURI|baseURI|referrer
|name|opener|parent|top|content|self|frames)W)|(
localStorage|sessionStorage|Database)
Uses: Searching Logs

• grep -v 156.132.142.[11-19]
/var/log/apache2/other_vhosts_access.log|grep
-v 156.132.103.*
• cat
/var/log/apache2/other_vhosts_access.log|grep
-o 's[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[09]{1,3}s' | sort -t . -k 3,3n -k 4,4n|uniq
Uses: VI Search and Replace

• SS#
:%s/d{3}-d{2}-d{4}/123-45-6789/g
• email
:%s/[0-9A-Za-z._%+-]+@[0-9A-Za-z._%+-]+.[AZa-z]{2,4}/john.doe@ao.uscourts.gov/g
Uses: Command Line

openssl ciphers|sed ‘s/:/n/g'|sort
Uses: Output Mangaling

while read line; do host $line; done < ips.txt | sed
's/ has address / / /g‘ > foo.txt
Uses: Programming

• Sanitizing input
$name = preg_replace("/<s*?/?scripts*?>/i",
"&lt;script&gt;", $name);
Useful RegExes
• SS#
d{3}-d{2}-d{4}
• Phone#
((?d{3})?[ -.])?d{3}[ -.]d{4}
• IP Addresses
b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)b

• email
[0-9A-Z._%+-]+@[0-9A-Z._%+-]+.[A-Z]{2,4}
• Find Base64
(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?
• Credit Card# - HTML Tags - Dates
Questions?
Go forth and RegEx…
References

•
•
•
•
•
•

Web Application Hacker's Handbook
http://regex.info/blog/2006-09-15/247#comment-3085
http://en.wikipedia.org/wiki/Regular_expression
https://isc.sans.edu/regex.html
http://www.regular-expressions.info/examples.html
http://blog.spiderlabs.com/2013/02/easy-dom-basedxss-detection-via-regexes.html
• https://en.wikipedia.org/wiki/Regular_expression
• www.xkcd.com

Regex 101

  • 1.
  • 2.
    Overview • • • • What is RegEx RegExBasics Uses for RegEx Useful RegExpressions
  • 3.
    What is RegEx? “Incomputing, a regular expression (abbreviated regex or regexp) is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations. “ - Wikipedia
  • 4.
    • “Some people,when confronted with a problem, think ‘I know, I'll use regular expressions.’ Now they have two problems.” Jamie Zawinski
  • 5.
    Why RegEx? • Toolsuse it: Nessus, Burp, W3AF • All programming languages use it • Excellent tool to have in the toolbox
  • 6.
    RegEx Basics: LiteralMatches Literal Matches ‘bat’ matches ‘bat’ 12 special characters - ^ $ . | ? * + ( ) [ ] These must be escaped ‘’ ‘$’ . ‘.at’ Matches ‘bat’, ‘cat’, and ‘hat’
  • 7.
    RegEx Basics: CharactureClasses Character Classes • -- [ ] ‘[bc]at’ will match ‘bat’ or ‘cat’ • --[^ ] [^A-Z] will match any character that is not a capitol letter
  • 8.
    RegEx Basics: ShorthandCharacter Classes Shorthand Character Classes • d Same as [0-9] • D Same as [^0-9] • w Same as [0-9A-Za-z_] • W Same as [^0-9A-Za-z_] • s tab, line feed, form feed, carriage return, and space • S Anything other than tab, line feed, etc.
  • 9.
    RegEx Basics: Anchors Anchors •^ Beginning of line ‘rpm -qa|grep ^ao’ would list all packages that start with ‘ao’ • $ End of line ‘[0-9][0-9][0-9]$’ would find all instances when a line ended with 3 consecutive digits • b b Word boundary ‘bW.n*b’ looks for words that begin with ‘W’ followed by any character followed by ‘n’ followed by zero or more characters ‘Win’ ‘Windows’ ‘Won’ ‘Wonton’ ‘Winter’ ‘Wonderland’ ‘Wonder’ all match
  • 10.
    RegEx Basics: Non-Printable Non-printable •-- n New Line • -- r Carriage Return
  • 11.
    RegEx Basics: Groups Groups •--( ) Defines the scope and precedence of operators ‘Write(ln)?’ matches ‘Write’ and ‘Writeln’ • -- | OR ‘Gr(a|e)y’ matches ‘Gray’ and ‘Grey’ ‘(ITSO|OITS)’ matches ‘ITSO’ or ‘OITS’
  • 12.
    RegEx Basics: Quantification Quantification Showshow often a token or group is allowed to occur • ? Zero or one ‘a?’ will match ‘’ and ‘a’ • * Zero or more ‘a*’ will match ‘’ and ‘a’ and ‘aaaaaaaaa’
  • 13.
    RegEx Basics: Quantification(Cont.) Quantification Shows how often a token or group is allowed to occur • + One or more ‘a+’ will match ‘a’ and ‘aaaaaaaaaaaa’ • {,} Minimum and Maximum ‘a{3,7}’ will match between 3 and 7 ‘a’
  • 14.
  • 15.
    Uses: Searches (Cont.) •DOM XSS ((src|href|data|location|code|value|action)s*["']]* s*+?s*=)|((replace|assign|navigate|getResponseHea der|open(Dialog)?|showModalDialog|eval|evaluate|e xecCommand|execScript|setTimeout|setInterval)s*["' ]]*s*() • DOM XSS (locations*[[.])|([.[]s*["']?s*(arguments|dialogArg uments|innerHTML|write(ln)?|open(Dialog)?|showMo dalDialog|cookie|URL|documentURI|baseURI|referrer |name|opener|parent|top|content|self|frames)W)|( localStorage|sessionStorage|Database)
  • 16.
    Uses: Searching Logs •grep -v 156.132.142.[11-19] /var/log/apache2/other_vhosts_access.log|grep -v 156.132.103.* • cat /var/log/apache2/other_vhosts_access.log|grep -o 's[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[09]{1,3}s' | sort -t . -k 3,3n -k 4,4n|uniq
  • 17.
    Uses: VI Searchand Replace • SS# :%s/d{3}-d{2}-d{4}/123-45-6789/g • email :%s/[0-9A-Za-z._%+-]+@[0-9A-Za-z._%+-]+.[AZa-z]{2,4}/john.doe@ao.uscourts.gov/g
  • 18.
    Uses: Command Line opensslciphers|sed ‘s/:/n/g'|sort
  • 19.
    Uses: Output Mangaling whileread line; do host $line; done < ips.txt | sed 's/ has address / / /g‘ > foo.txt
  • 20.
    Uses: Programming • Sanitizinginput $name = preg_replace("/<s*?/?scripts*?>/i", "&lt;script&gt;", $name);
  • 21.
    Useful RegExes • SS# d{3}-d{2}-d{4} •Phone# ((?d{3})?[ -.])?d{3}[ -.]d{4} • IP Addresses b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3} (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)b • email [0-9A-Z._%+-]+@[0-9A-Z._%+-]+.[A-Z]{2,4} • Find Base64 (?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)? • Credit Card# - HTML Tags - Dates
  • 22.
  • 23.
    Go forth andRegEx…
  • 24.
    References • • • • • • Web Application Hacker'sHandbook http://regex.info/blog/2006-09-15/247#comment-3085 http://en.wikipedia.org/wiki/Regular_expression https://isc.sans.edu/regex.html http://www.regular-expressions.info/examples.html http://blog.spiderlabs.com/2013/02/easy-dom-basedxss-detection-via-regexes.html • https://en.wikipedia.org/wiki/Regular_expression • www.xkcd.com