Master 100 Regex Patterns & Automata Theory in One Compact Slide Deck
Regex & Automata Decoded – a lightning tour from 100 copy-paste patterns to the theory that powers them. Master digits, emails, code tokens; flip English specs into DFAs; learn when regex hits its limits (and how to dodge the back-tracking trap).
Foundations: Alphabets, Strings& Languages
The building blocks of formal language theory.
Alphabet (Σ)
A finite, non-empty set of symbols. Ex: Σ
= {a, b}, Σ = {0, 1}.
String (w)
A finite sequence of symbols from an
alphabet. Ex: "abb", "101".
Language (L)
A set of strings. Can be finite or infinite.
Ex: L = {w | w ends with 'b'}.
Key Concepts & Notation
Empty String (ε): The unique string of length zero.
Length (|w|): The number of symbols in a string.
Concatenation: Joining two strings end-to-end.
Kleene Star (Σ*): The set of all possible strings over Σ, including ε.
5.
The Power ofRegular Languages
A language is regular if a regex can describe it. They form the foundation of text processing.
Chomsky Hierarchy
Type 0: Recursively Enumerable
Type 1: Context-Sensitive
Type 2: Context-Free
Type 3: Regular Languages
Closure Properties
Union (L₁ L₂)
∪ Concatenation (L₁L₂)
Kleene Star (L*) Complement (Lᶜ)
Intersection (L₁ L₂)
∩
Regex Patterns: Digits,Letters & Identifiers
Digits & Letters
[0-9] any digit
[a-zA-Z] any letter
d digit
w word char
s whitespace
Negations
D non-digit
W non-word
S non-space
[^aeiou] non-vowel
Quantifiers
x? optional
x* zero or more
x+ one or more
x{n,m} n to m
Anchors & Boundaries
^ start
$ end
b word boundary
B non-boundary
Cases & Concatenation
[A-Z] uppercase
[a-z] lowercase
. any char
xy concatenation
x|y union
Identifiers
[a-zA-Z0-9_]
Matches any character valid in an identifier.
Descriptive Expressions (1/5)
TranslatingEnglish into formal patterns.
1 All strings over {a,b} that end with the symbol 'b'. 2 Strings that contain an even number of 'a's.
3 Strings that do not contain the substring "ab". 4 Strings where every 'b' is immediately followed by an
'a'.
5 Strings that contain at least three 'a's. 6 Strings whose length is a multiple of 3.
7 Strings that contain the substring "aaa". 8 Strings with an equal number of 'a's and 'b's.
9 Strings that start and end with the same symbol. 10 Strings with no consecutive identical letters.
13.
Descriptive Expressions (3/5)
Exploringpatterns across different alphabets and structures.
21 All decimal strings without leading zeros. 22 Strings over {a,b,c} where all 'a's appear before all
'b's.
23 Strings that contain at least one double letter (e.g.,
"aa").
24 Strings where no letter is repeated.
25 Strings that form a palindrome (read the same
forwards and backwards).
26 Strings composed of vowels only.
27 Strings composed of consonants only. 28 Strings where the length equals the number of
vowels.
29 Strings that contain exactly two words (separated by
spaces).
30 Strings of balanced parentheses over { (, ) }.
14.
Descriptive Expressions (2/5)
Focusingon binary patterns and numerical properties.
11 Binary strings representing numbers divisible by 3. 12 Binary strings that do not contain "00" as a substring.
13 Strings with an odd number of 0s and an even
number of 1s.
14 Strings whose third symbol from the start is a '1'.
15 Strings that have a '0' in every odd position. 16 Strings that end with "11".
17 Strings that contain "010" as a substring. 18 Strings where 0s and 1s alternate (e.g., 0101...).
19 Strings with more 0s than 1s. 20 Strings where every 0 is immediately followed by a 1.
15.
Descriptive Expressions (4/5)
Definingpatterns over a custom {x, y} alphabet.
31 Strings over {x,y} where every 'y' occurs in pairs. 32 Strings that contain the substring "xyx".
33 Strings where the number of 'x's is divisible by 4. 34 Strings that end with "xy".
35 Strings that do not contain "yy" as a substring. 36 Strings where "xy" appears exactly twice.
37 Strings with alternating 'x' and 'y' (e.g., xyxy...). 38 Strings where every 'x' occurs before any 'y'.
39 Strings with an equal number of runs of 'x's and 'y's. 40 Strings with no 'x' in an even position.
16.
Descriptive Expressions (5/5)
Specifyingreal-world data formats in plain English.
41 All strings that look like an email (local@domain). 42 URLs that start with an optional "https://".
43 Floating-point numbers with an optional sign. 44 Dates in the format "yyyy-mm-dd".
45 Time strings in the format "hh:mm:ss". 46 MAC addresses with colon-separated hex pairs.
47 Credit-card numbers in four groups of four digits. 48 Hashtags starting with '#' followed by alphanumeric
chars.
49 File paths that end with a ".txt" extension. 50 Strings with exactly one space between words.
The Trinity: Regex,NFA & DFA
Kleene's Theorem states that Regular Expressions, NFAs, and DFAs have the same
expressive power. They all describe the same class of languages: the Regular Languages.
Regex
≡
NFA
≡
DFA
This equivalence allows us to convert freely between these representations for design, verification, and optimization.
19.
From English Descriptionto DFA
A systematic workflow for translating informal specs into formal automata.
1. Parse Spec
Identify atomic conditions
and memory needs.
2. Design States
Assign states to track
progress and memory.
3. Add Transitions
Define transitions for each
symbol in each state.
4. Minimize
Use table-filling to create
the optimal DFA.
Example: "Even number of a's" results in a 2-state DFA (tracking even/odd count).
Practical Use: Tokenizingwith Regex
How lexical analyzers (lexers) use regex to break source code into meaningful tokens.
Source Code
if (x > 10)
Regex Patterns
if|while|for
[a-zA-Z_][a-zA-Z0-9_]*
[0-9]+
Tokens
(KEYWORD, "if")
(IDENTIFIER, "x")
(NUMBER, "10")
NFAs/DFAs for each pattern run in parallel via Thompson's construction, ensuring an O(n) scan time.
22.
Practical Use: Validation& Extraction
Core techniques for applying regex in real-world scenarios.
Validation
Ensure input conforms to an expected pattern to reject
malformed data.
if/^d{5}$/.test(zipCode) { /* ... */ }
Extraction
Use capturing groups to isolate and retrieve specific parts of
a string.
const match = text.match(/(d{3})-(d{4})/);
Remember to balance greediness with laziness (e.g., `.*` vs `.*?`) to avoid catastrophic backtracking.
The Limits: Non-RegularLanguages
Not all languages are regular. The Pumping Lemma is a tool to prove this.
The Pumping Lemma
For any regular language L, there exists a "pumping length"
p such that any string w in L with |w| p can be divided
≥
into three parts, w = xyz, satisfying:
|xy| p
≤
|y| 1
≥
xyⁱz L for all i 0
∈ ≥
Example: L = {aⁿbⁿ | n 0}
≥
Choose w = aᵖbᵖ. The lemma fails, so L is not regular.
Example: Palindromes
Also non-regular, requiring a more powerful model (Context-
Free Grammar).
25.
Performance Traps &How to Avoid Them
Regex can be powerful, but poorly written patterns can lead to catastrophic backtracking.
The Trap: Exponential Backtracking
Nested quantifiers like `(a+)+b` on a long string of 'a's can
cause the engine to explore an exponential number of
paths.
/(a+)+b/.test('aaaaaaaaaaaaa') // Can freeze!
The Solution: Linear Time
Use possessive quantifiers (e.g., `a++`).
Use atomic groups (e.g., `(?>...)`).
Prefer lazy quantifiers (e.g., `.*?`).
Use a DFA-based regex engine when possible.
26.
Key Takeaways &Roadmap
100 Patterns
A library of common regex for
digits, text, networks, and code.
50 Descriptions
Practice translating English specs
into formal patterns.
Automata Theory
The foundation for verification,
optimization, and understanding
limits.
Disciplined Usage
Balance power with performance
to avoid common traps.
From here, explore Context-Free Languages and modern parsing algorithms!