Abstract:
Writing Regular Expressions (Regex) is a versatile skill set to have across the IT landscape. Regex has a number of information security related uses and applications. We are going to provide an overview and show examples of writing Regex for pattern matching and file content analysis using sample threat feed data in this presentation. Along with a healthy dose of motherly advice, we cover Regex syntax, character classes, capture groups, and sub-capture groups. Whether Regex is something completely new or worth brushing up on, this talk is geared toward you.
Bio:
Matt Scheurer is a Systems Security Engineer working in the Financial Services industry. Matt holds CompTIA Security+, MCP, MCPS, MCTS, MCSA, and MCITP certifications. He maintains active memberships in a number of professional organizations including the Association for Computing Machinery (ACM), Cincinnati Networking Professionals Association (CiNPA), and Information Systems Security Association (ISSA). Matt is a regular attendee at monthly Information Security meetings for 2600, the CiNPA affiliated Security Special Interest Group (CiNPA Security SIG), Ohio Information Security Forum (OISF), and Cincinnati SMBA.
1. Regular Expressions (Regex) Overview
September 24, 2017
Matt Scheurer
@c3rkah
Slides:
https://www.slideshare.net/cerkah
((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]).(25[0-5]|2[0-4]
[0-9]|1[0-9][0-9]|[1-9]?[0-9]).(25[0-5]|2[0-4][0-9]|1[0-9][0-
9]|[1-9]?[0-9]).(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]))
2. About Me
Matt Scheurer
Systems Security Engineer
Working in the Financial Services Industry
Meeting Organizer for the CiNPA Security SIG
DerbyCon 5.0 “Unity” Speaker
Certifications: CompTIA Security+, MCP, MCPS, MCTS, MCSA,
and MCITP
3. What Regular Expressions are Not!
● The term “Regular Expressions” or often
simply called “Regex” for short should not be
confused with “Old Sayings”
– Adages, Allegories, Aphorisms, Axioms, Clichés,
Epigrams, Idioms, Hyperboles, Maxims, Platitudes,
Proverbs, Truisms, etc.
4. When it comes to “Old Sayings”...
You would be hard
pressed to beat the
recollection and
retelling of old
sayings than my own
mother...
5. What is Regex?
Regex is a common syntax used to match
patterns when parsing text data or output. Regex
capture groups are used to extract strings of
specific data into reference points for retrieval or
processing.
6. Why learn Regex?
● Regex is a great skill set to have in the back pocket of
nearly any interdisciplinary role across the Information
Technology landscape
● Uses include:
– Application and Software Development
– Database queries
– Linux Administration and power user commands such as
grep, awk, sed, find, etc.
– Searching through any type of text data or system logs
7. Regex uses in InfoSec
● Content filtering
● Input validation
● NGFW / UTM Layer 7 definitions
● Parsing large volumes of data or system logs to pick out specific
data points of interest
● SIEM systems
– Building or refining entire searches, or performing advanced parsing to
narrow down extraneous information
– Finding specific log events or log event items and sub-data
● Understand the underpinnings of many security products and
utilities
9. Different flavors of Regex
● While all versions of Regex share common
conventions there are proprietary differences
across the various Regex engines
● Popular Regex Engines include:
– Perl, PCRE, PHP, .NET, Java, JavaScript,
XRegExp, VBScript, Python, Ruby, Delphi, R, Tcl,
POSIX, and others
10. Regex Resources
● Online Learning Site - https://regexone.com/
● Regex Test Site - http://regexr.com/
● Tutorial Site - http://www.rexegg.com/
● Countless Additional Resources -
https://www.google.com/search?q=regex
● Further Reading -
https://en.wikipedia.org/wiki/Regular_expression
12. Regex Basics – Simple Matching
● Simply type in exactly what you are trying to
match
● Text string pattern matching is case-sensitive!
– NOTE: certain non-alpha-numeric characters may
require an escape prefix to match
●
13. Regex Basics – Text Matching
● In addition to typing in an exact text string for
an exact match “w” will match a single
alphanumeric character
– Matches any word character (alphanumeric &
underscore)
– Only matches low-ascii characters (no accented or
non-roman characters)
14. Regex Basics – Number Matching
● In addition to typing in an exact numeric string
for an exact match “d” will match a single digit.
– Matches any digit character (0-9)
15. Regex Basics – Matching a Space
● In addition to typing in an exact string with a
space included for an exact match “s” will
match a space in text
– Matches any whitespace character (spaces, tabs,
line breaks)
16. Regex Basics – Matching Opposites
● We just looked at a few character classes
– All character classes are case-sensitive
– Specifying those character classes in upper-case changes
the pattern match to match the opposite
● “W”, “D”, and “S” respectively translate to
– Not a word character
– Not a digit
– Not whitespace
17. Regex Basics – Quantifiers
● “.” matches any single character
● “+” suffix matches one or more repetitions
● “*” suffix matches zero or more repetitions
● “?” suffix means the character is optional
● “|” is an ‘or’ separator between characters
● “^” is a ‘not’ specifier to exclude a character
– Enclosed in square brackets prefixing the pattern
– [^<pattern>]
18. Regex Basics – Escaped Characters
● What if I want to match escaped characters such as a
“., +, *, ?, |, ^, etc.” in my pattern against the data?
– Prefix reserved escape characters with a “”
● What if I want to match a “” in my pattern
against the data?
–
19. Regex Basics – Ranges
● In addition to quantifiers (wild cards), ranges may be
specified with pattern matching
– Characters are enclosed inside of square brackets
“[“ “]” and separated by a hyphen “-”
● Examples:
– [a-z], [A-Z], and [0-9]
20. Regex Basics – Repetitions
● In addition to a range quantifier, repetitions may be
specified with pattern matching
– The number of character occurrences are specified
inside of curly brackets/braces “{“ “}”, or separated
by a comma “,” for a range of occurrences
● A{4} matches exactly “AAAA”
● A{1,4} matches “A”, “AA”, “AAA”, or “AAAA”
● A{4,} matches four or more consecutive “A’s”
21. Regex Basics – Line Matching
● The beginning of a line and/or end of a line may be
specified in Regex pattern matching
– “^”, matches the beginning (starts with) of a line
– “$”, matches the end of a line
– “^<pattern>$”, matches when the line begins with
and ends with the specified pattern
22. Regex Capture Groups
● The true power of Regex is fully realized with
defined capture groups
● These essentially define array like variables to
pattern matched data
– This is how we return the precise data we want,
while ignoring the content we do not care about
● Capture groups are defined by patterns
enclosed inside of parenthesis “(“ “)”
23. Regex Sub-Capture Groups
● Regex sub-capture groups can be defined by
using nested parenthesis “(“ “)”
– Example:
● “(Pattern (match))”
– First Capture Group = Pattern match
– Second Capture Group = match
26. Regex Example 1
● Threat Feed: malware-domains
– Latest Blackhole-DNS File list
– "BOOT" format
– http://malware-domains.com/files/BOOT.zip
● Objective: Capture a list of FQDN’s
28. Example 1 – Expression
PRIMARYs(S+)
Capture Group
amazon.co.uk.security-check.ga
autosegurancabrasil.com
christianmensfellowshipsoftball.org
dadossolicitado-antendimento.sad879.mobi
hitnrun.com.my
houmani-lb.com
maruthorvattomsrianjaneyatemple.org
paypalsecure-2016.sucurecode524154241.arita.ac.tz
tei.portal.crockerandwestridge.com
tonyyeo.com
update-apple.com.betawihosting.net
29. Regex Example 2
● Threat Feed: malware-domains
– Complete Zone File (bind)
– Spyware Domains
– http://malware-domains.com/files/spywaredomains.zones.zip
● Objective: Capture a list of FQDN’s
34. Example 3 – Expression
(d{1,3}.d{1,3}.d{1,3}.d{1,3})
Capture Group
185.165.29.49
185.91.116.237
76.74.167.171
193.227.248.241
149.210.167.172
216.114.192.21
89.255.9.102
86.109.162.144
85.25.203.171
209.90.88.139
35. Regex Example 4
● Threat Feed: SpamCop
– Spam in progress
– Source of Mail
– wget https://www.spamcop.net/w3m?action=inprogress
● Objective: Capture a list of IP addresses
37. Example 4 – Expression
>(d{1,3}.d{1,3}.d{1,3}.d{1,3})<
Capture Group
182.139.29.84
201.37.197.39
182.151.104.105
119.5.175.57
119.5.175.57
38. Regex Example 5
● Threat Feed: Malware Domain List
– Complete database in CSV format
– http://www.malwaredomainlist.com/mdlcsv.php
● export.csv
● Objective: Capture a list of FQDN’s
44. The End
Big Thank You and shout
out to my dear sweet
mother! She’s a very
special person in my life,
and a fantastic
Grandmother!
...Plus she endured the
unenviable task of raising me as
a child and teenager. :)
Pictured above: My mom with my son
Love you mom!