Regular expressions (regex) are a language used to parse text and apply logic and constraints to find patterns in strings. They provide a concise way to find matches in text that is supported across many programming languages. This document provides an overview of regex, examples of code using regex in different languages, and descriptions of common regex patterns and metacharacters used to define matching rules. It recommends resources for further reading on mastering regular expressions.
2. What Are They?
• Language to parse text
• Apply logic and constraints
• Concise (but not readable)
• Consistent (mostly)
• Widely supported in programming
languages
4. Java Regex Code
Pattern pattern = Pattern.compile("hello");
Matcher matcher = pattern.matcher("hello world");
// Find all matches
while (matcher.find()) {
// Get the matching string
String match = matcher.group();
// match = “hello”
}
5. C# Regex Code
foreach (Match match in
Regex.Matches("hello world", "hello",
RegexOptions.IgnoreCase)) {
// Get the matching string
String match = match.Value;
// match = “hello”
}
8. The (Ugly) Alternative
String needle = "hello";
String haystack = "hello world hello world";
int index = 0;
while ((index = haystack.indexOf( needle,
index )) != -1) {
String match = haystack.substring( index,
index + needle.length() );
index++;
}
9. Regex Metacharacters
• * - Match zero or more times
• ? - Match zero or 1 time
• + - Match one or more times
• ^ - Match the start of a string
• $ - Match the end of a string
10. Character Classes
POSIXPOSIX ShorthandShorthand LonghandLonghand DescriptionDescription
[:word:] w [A-Za-z0-9_] Alphanumeric Chars.
W [^A-Za-z0-9_]
Non-alphanumeric
Chars.
[:alpha:] [A-Za-z] Alphabetic Chars.
[:blank:] [ t] Space and tab
[:digit:] d [0-9] Numeric Characters
D [^0-9] Non-numeric Chars.
[:space:] s [ trnvf] Whitespace Characters