Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DerbyCon 7.0 Legacy: Regular Expressions (Regex) Overview


Published on


Writing Regular Expressions (Regex) is a versatile skill set to have across the IT landscape. Regex has a number of information security related uses and applications. We are going to provide an overview and show examples of writing Regex for pattern matching and file content analysis using sample threat feed data in this presentation. Along with a healthy dose of motherly advice, we cover Regex syntax, character classes, capture groups, and sub-capture groups. Whether Regex is something completely new or worth brushing up on, this talk is geared toward you.


Matt Scheurer is a Systems Security Engineer working in the Financial Services industry. Matt holds CompTIA Security+, MCP, MCPS, MCTS, MCSA, and MCITP certifications. He maintains active memberships in a number of professional organizations including the Association for Computing Machinery (ACM), Cincinnati Networking Professionals Association (CiNPA), and Information Systems Security Association (ISSA). Matt is a regular attendee at monthly Information Security meetings for 2600, the CiNPA affiliated Security Special Interest Group (CiNPA Security SIG), Ohio Information Security Forum (OISF), and Cincinnati SMBA.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

DerbyCon 7.0 Legacy: Regular Expressions (Regex) Overview

  1. 1. Regular Expressions (Regex) Overview September 24, 2017 Matt Scheurer @c3rkah Slides: ((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]).(25[0-5]|2[0-4] [0-9]|1[0-9][0-9]|[1-9]?[0-9]).(25[0-5]|2[0-4][0-9]|1[0-9][0- 9]|[1-9]?[0-9]).(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]))
  2. 2. About Me Matt Scheurer Systems Security Engineer Working in the Financial Services Industry Meeting Organizer for the CiNPA Security SIG DerbyCon 5.0 “Unity” Speaker Certifications: CompTIA Security+, MCP, MCPS, MCTS, MCSA, and MCITP
  3. 3. What Regular Expressions are Not! ● The term “Regular Expressions” or often simply called “Regex” for short should not be confused with “Old Sayings” – Adages, Allegories, Aphorisms, Axioms, Clichés, Epigrams, Idioms, Hyperboles, Maxims, Platitudes, Proverbs, Truisms, etc.
  4. 4. When it comes to “Old Sayings”... You would be hard pressed to beat the recollection and retelling of old sayings than my own mother...
  5. 5. What is Regex? Regex is a common syntax used to match patterns when parsing text data or output. Regex capture groups are used to extract strings of specific data into reference points for retrieval or processing.
  6. 6. Why learn Regex? ● Regex is a great skill set to have in the back pocket of nearly any interdisciplinary role across the Information Technology landscape ● Uses include: – Application and Software Development – Database queries – Linux Administration and power user commands such as grep, awk, sed, find, etc. – Searching through any type of text data or system logs
  7. 7. Regex uses in InfoSec ● Content filtering ● Input validation ● NGFW / UTM Layer 7 definitions ● Parsing large volumes of data or system logs to pick out specific data points of interest ● SIEM systems – Building or refining entire searches, or performing advanced parsing to narrow down extraneous information – Finding specific log events or log event items and sub-data ● Understand the underpinnings of many security products and utilities
  8. 8. Regex Variations and Variances
  9. 9. Different flavors of Regex ● While all versions of Regex share common conventions there are proprietary differences across the various Regex engines ● Popular Regex Engines include: – Perl, PCRE, PHP, .NET, Java, JavaScript, XRegExp, VBScript, Python, Ruby, Delphi, R, Tcl, POSIX, and others
  10. 10. Regex Resources ● Online Learning Site - ● Regex Test Site - ● Tutorial Site - ● Countless Additional Resources - ● Further Reading -
  11. 11. Let’s Begin...
  12. 12. Regex Basics – Simple Matching ● Simply type in exactly what you are trying to match ● Text string pattern matching is case-sensitive! – NOTE: certain non-alpha-numeric characters may require an escape prefix to match ●
  13. 13. Regex Basics – Text Matching ● In addition to typing in an exact text string for an exact match “w” will match a single alphanumeric character – Matches any word character (alphanumeric & underscore) – Only matches low-ascii characters (no accented or non-roman characters)
  14. 14. Regex Basics – Number Matching ● In addition to typing in an exact numeric string for an exact match “d” will match a single digit. – Matches any digit character (0-9)
  15. 15. Regex Basics – Matching a Space ● In addition to typing in an exact string with a space included for an exact match “s” will match a space in text – Matches any whitespace character (spaces, tabs, line breaks)
  16. 16. Regex Basics – Matching Opposites ● We just looked at a few character classes – All character classes are case-sensitive – Specifying those character classes in upper-case changes the pattern match to match the opposite ● “W”, “D”, and “S” respectively translate to – Not a word character – Not a digit – Not whitespace
  17. 17. Regex Basics – Quantifiers ● “.” matches any single character ● “+” suffix matches one or more repetitions ● “*” suffix matches zero or more repetitions ● “?” suffix means the character is optional ● “|” is an ‘or’ separator between characters ● “^” is a ‘not’ specifier to exclude a character – Enclosed in square brackets prefixing the pattern – [^<pattern>]
  18. 18. Regex Basics – Escaped Characters ● What if I want to match escaped characters such as a “., +, *, ?, |, ^, etc.” in my pattern against the data? – Prefix reserved escape characters with a “” ● What if I want to match a “” in my pattern against the data? –
  19. 19. Regex Basics – Ranges ● In addition to quantifiers (wild cards), ranges may be specified with pattern matching – Characters are enclosed inside of square brackets “[“ “]” and separated by a hyphen “-” ● Examples: – [a-z], [A-Z], and [0-9]
  20. 20. Regex Basics – Repetitions ● In addition to a range quantifier, repetitions may be specified with pattern matching – The number of character occurrences are specified inside of curly brackets/braces “{“ “}”, or separated by a comma “,” for a range of occurrences ● A{4} matches exactly “AAAA” ● A{1,4} matches “A”, “AA”, “AAA”, or “AAAA” ● A{4,} matches four or more consecutive “A’s”
  21. 21. Regex Basics – Line Matching ● The beginning of a line and/or end of a line may be specified in Regex pattern matching – “^”, matches the beginning (starts with) of a line – “$”, matches the end of a line – “^<pattern>$”, matches when the line begins with and ends with the specified pattern
  22. 22. Regex Capture Groups ● The true power of Regex is fully realized with defined capture groups ● These essentially define array like variables to pattern matched data – This is how we return the precise data we want, while ignoring the content we do not care about ● Capture groups are defined by patterns enclosed inside of parenthesis “(“ “)”
  23. 23. Regex Sub-Capture Groups ● Regex sub-capture groups can be defined by using nested parenthesis “(“ “)” – Example: ● “(Pattern (match))” – First Capture Group = Pattern match – Second Capture Group = match
  24. 24. Regex Pattern Matching Problems?
  25. 25. Really Stuck? Just Remember...
  26. 26. Regex Example 1 ● Threat Feed: malware-domains – Latest Blackhole-DNS File list – "BOOT" format – ● Objective: Capture a list of FQDN’s
  27. 27. Example 1 – Data Format
  28. 28. Example 1 – Expression PRIMARYs(S+) Capture Group
  29. 29. Regex Example 2 ● Threat Feed: malware-domains – Complete Zone File (bind) – Spyware Domains – ● Objective: Capture a list of FQDN’s
  30. 30. Example 2 – Data Format
  31. 31. Example 2 – Expression zones"(S+)" Capture Group
  32. 32. Regex Example 3 ● Threat Feed: DNS BlackHole – IP Blacklist – ● Objective: Capture a list of IP addresses
  33. 33. Example 3 – Data Format
  34. 34. Example 3 – Expression (d{1,3}.d{1,3}.d{1,3}.d{1,3}) Capture Group
  35. 35. Regex Example 4 ● Threat Feed: SpamCop – Spam in progress – Source of Mail – wget ● Objective: Capture a list of IP addresses
  36. 36. Example 4 – Data Format
  37. 37. Example 4 – Expression >(d{1,3}.d{1,3}.d{1,3}.d{1,3})< Capture Group
  38. 38. Regex Example 5 ● Threat Feed: Malware Domain List – Complete database in CSV format – ● export.csv ● Objective: Capture a list of FQDN’s
  39. 39. Example 5 – Data Format
  40. 40. Example 5 – Expression "d{4}/d{2}/d{2}_d{2}:d{2}","(w[.|-|w]+) Capture Group
  41. 41. Keeping the Regex Saw Sharpened
  42. 42. Upcoming Speaking Engagements
  43. 43. Questions?
  44. 44. The End Big Thank You and shout out to my dear sweet mother! She’s a very special person in my life, and a fantastic Grandmother! ...Plus she endured the unenviable task of raising me as a child and teenager. :) Pictured above: My mom with my son Love you mom!