Master 100 Regex Patterns & Automata Theory in One Compact Slide Deck

Regex & Automata
Decoded
By Dildar Jakhro

Foundations
100 Regex Patterns
Descriptive Forms
Automata View
Practical Usage
Limits & Beyond

Foundations: Alphabets, Strings & Languages
The building blocks of formal language theory.
Alphabet (Σ)
A finite, non-empty set of symbols. Ex: Σ
= {a, b}, Σ = {0, 1}.
String (w)
A finite sequence of symbols from an
alphabet. Ex: "abb", "101".
Language (L)
A set of strings. Can be finite or infinite.
Ex: L = {w | w ends with 'b'}.
Key Concepts & Notation
Empty String (ε): The unique string of length zero.
Length (|w|): The number of symbols in a string.
Concatenation: Joining two strings end-to-end.
Kleene Star (Σ*): The set of all possible strings over Σ, including ε.

The Power of Regular Languages
A language is regular if a regex can describe it. They form the foundation of text processing.
Chomsky Hierarchy
Type 0: Recursively Enumerable
Type 1: Context-Sensitive
Type 2: Context-Free
Type 3: Regular Languages
Closure Properties
Union (L₁ L₂)
∪ Concatenation (L₁L₂)
Kleene Star (L*) Complement (Lᶜ)
Intersection (L₁ L₂)
∩

Regex Patterns: Digits, Letters & Identifiers
Digits & Letters
[0-9] any digit
[a-zA-Z] any letter
d digit
w word char
s whitespace
Negations
D non-digit
W non-word
S non-space
[^aeiou] non-vowel
Quantifiers
x? optional
x* zero or more
x+ one or more
x{n,m} n to m
Anchors & Boundaries
^ start
$ end
b word boundary
B non-boundary
Cases & Concatenation
[A-Z] uppercase
[a-z] lowercase
. any char
xy concatenation
x|y union
Identifiers
[a-zA-Z0-9_]
Matches any character valid in an identifier.

Regex Patterns: Numbers, Ranges & Scientific Notation
Integers
[1-9][0-9]* // Positive
-?[0-9]+ // Signed
0[xX][0-9a-fA-F]+ // Hex
0[0-7]* // Octal
0b[01]+ // Binary
Floating-Point
[0-9]+.[0-9]+ // Fixed
(+|-)?([0-9]+.?[0-9]*|.[0-9]+) // Decimal
[0-9]+(e|E)(+|-)?[0-9]+ // Scientific
Formatted Numbers
[0-9]{1,3}(,[0-9]{3})* // Thousands
Date & Time
(0?[1-9]|[12][0-9]|3[01]) // Day
(0?[1-9]|1[0-2]) // Month
(19|20)[0-9]{2} // Year
[0-2][0-9]:[0-5][0-9] // HH:MM

Regex Patterns: Emails, URLs & Network Addresses
Email & URLs
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}
Email
https?://[^/s]+
URL
(ftp|http|https)://[^/s]+
Protocol URL
www.[^/s]+
Web Address
IP Addresses
bd{1,3}.d{1,3}.d{1,3}.d{1,3}b
IPv4
([0-9A-Fa-f]{1,4}:){7}[0-9A-Fa-f]{1,4}
IPv6
b[A-Z]{2}b
Country Code
bd{4}b
Port
Query & Path
[?&][a-zA-Z_][0-9a-zA-Z_]*=
Query Key
/[^/s]+
Path
File Types
.(jpg|jpeg|png|gif)
Image
.(pdf|doc|docx)
Document

Regex Patterns: Text, Markup & Code
Markdown & HTML
**[^*]+**
Bold (**text**)
`[^`]+`
Inline Code
^#+s+
Heading
Missing superscript or subscript argumentMissing superscript or
subscript argument]+)]([)]+)
Link
<[^>]+>
HTML Tag
Programming
#[A-Za-z_]w*
Python Comment
//.*$
C++ Comment
[a-z]+(_[a-z]+)*
snake_case
b[A-Z][a-z]+[A-Z][a-z]*
camelCase
Dates & Days
b(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun)b
Weekday
b(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)b
Month
d{4}-d{2}-d{2}
Logs & CS Terms
ERROR|WARN|INFO
Log Level
b(stack|heap|queue)b
CS Terms

Descriptive Expressions (1/5)
Translating English into formal patterns.
1 All strings over {a,b} that end with the symbol 'b'. 2 Strings that contain an even number of 'a's.
3 Strings that do not contain the substring "ab". 4 Strings where every 'b' is immediately followed by an
'a'.
5 Strings that contain at least three 'a's. 6 Strings whose length is a multiple of 3.
7 Strings that contain the substring "aaa". 8 Strings with an equal number of 'a's and 'b's.
9 Strings that start and end with the same symbol. 10 Strings with no consecutive identical letters.

Exploring patterns across different alphabets and structures.
21 All decimal strings without leading zeros. 22 Strings over {a,b,c} where all 'a's appear before all
'b's.
23 Strings that contain at least one double letter (e.g.,
"aa").
24 Strings where no letter is repeated.
25 Strings that form a palindrome (read the same
forwards and backwards).
26 Strings composed of vowels only.
27 Strings composed of consonants only. 28 Strings where the length equals the number of
vowels.
29 Strings that contain exactly two words (separated by
spaces).
30 Strings of balanced parentheses over { (, ) }.

Focusing on binary patterns and numerical properties.
11 Binary strings representing numbers divisible by 3. 12 Binary strings that do not contain "00" as a substring.
13 Strings with an odd number of 0s and an even
number of 1s.
14 Strings whose third symbol from the start is a '1'.
15 Strings that have a '0' in every odd position. 16 Strings that end with "11".
17 Strings that contain "010" as a substring. 18 Strings where 0s and 1s alternate (e.g., 0101...).
19 Strings with more 0s than 1s. 20 Strings where every 0 is immediately followed by a 1.

Defining patterns over a custom {x, y} alphabet.
31 Strings over {x,y} where every 'y' occurs in pairs. 32 Strings that contain the substring "xyx".
33 Strings where the number of 'x's is divisible by 4. 34 Strings that end with "xy".
35 Strings that do not contain "yy" as a substring. 36 Strings where "xy" appears exactly twice.
37 Strings with alternating 'x' and 'y' (e.g., xyxy...). 38 Strings where every 'x' occurs before any 'y'.
39 Strings with an equal number of runs of 'x's and 'y's. 40 Strings with no 'x' in an even position.

Specifying real-world data formats in plain English.
41 All strings that look like an email (local@domain). 42 URLs that start with an optional "https://".
43 Floating-point numbers with an optional sign. 44 Dates in the format "yyyy-mm-dd".
45 Time strings in the format "hh:mm:ss". 46 MAC addresses with colon-separated hex pairs.
47 Credit-card numbers in four groups of four digits. 48 Hashtags starting with '#' followed by alphanumeric
chars.
49 File paths that end with a ".txt" extension. 50 Strings with exactly one space between words.

The Trinity: Regex, NFA & DFA
Kleene's Theorem states that Regular Expressions, NFAs, and DFAs have the same
expressive power. They all describe the same class of languages: the Regular Languages.
Regex
≡
NFA
≡
DFA
This equivalence allows us to convert freely between these representations for design, verification, and optimization.

From English Description to DFA
A systematic workflow for translating informal specs into formal automata.
1. Parse Spec
Identify atomic conditions
and memory needs.
2. Design States
Assign states to track
progress and memory.
3. Add Transitions
Define transitions for each
symbol in each state.
4. Minimize
Use table-filling to create
the optimal DFA.
Example: "Even number of a's" results in a 2-state DFA (tracking even/odd count).

Practical Use: Tokenizing with Regex
How lexical analyzers (lexers) use regex to break source code into meaningful tokens.
Source Code
if (x > 10)
Regex Patterns
if|while|for
[a-zA-Z_][a-zA-Z0-9_]*
[0-9]+
Tokens
(KEYWORD, "if")
(IDENTIFIER, "x")
(NUMBER, "10")
NFAs/DFAs for each pattern run in parallel via Thompson's construction, ensuring an O(n) scan time.

Practical Use: Validation & Extraction
Core techniques for applying regex in real-world scenarios.
Validation
Ensure input conforms to an expected pattern to reject
malformed data.
if/^d{5}$/.test(zipCode) { /* ... */ }
Extraction
Use capturing groups to isolate and retrieve specific parts of
a string.
const match = text.match(/(d{3})-(d{4})/);
Remember to balance greediness with laziness (e.g., `.*` vs `.*?`) to avoid catastrophic backtracking.

The Limits: Non-Regular Languages
Not all languages are regular. The Pumping Lemma is a tool to prove this.
The Pumping Lemma
For any regular language L, there exists a "pumping length"
p such that any string w in L with |w| p can be divided
≥
into three parts, w = xyz, satisfying:
|xy| p
≤
|y| 1
≥
xyⁱz L for all i 0
∈ ≥
Example: L = {aⁿbⁿ | n 0}
≥
Choose w = aᵖbᵖ. The lemma fails, so L is not regular.
Example: Palindromes
Also non-regular, requiring a more powerful model (Context-
Free Grammar).

Performance Traps & How to Avoid Them
Regex can be powerful, but poorly written patterns can lead to catastrophic backtracking.
The Trap: Exponential Backtracking
Nested quantifiers like `(a+)+b` on a long string of 'a's can
cause the engine to explore an exponential number of
paths.
/(a+)+b/.test('aaaaaaaaaaaaa') // Can freeze!
The Solution: Linear Time
Use possessive quantifiers (e.g., `a++`).
Use atomic groups (e.g., `(?>...)`).
Prefer lazy quantifiers (e.g., `.*?`).
Use a DFA-based regex engine when possible.

Key Takeaways & Roadmap
100 Patterns
A library of common regex for
digits, text, networks, and code.
50 Descriptions
Practice translating English specs
into formal patterns.
Automata Theory
The foundation for verification,
optimization, and understanding
limits.
Disciplined Usage
Balance power with performance
to avoid common traps.
From here, explore Context-Free Languages and modern parsing algorithms!

Master 100 Regex Patterns & Automata Theory in One Compact Slide Deck

More Related Content

Similar to Master 100 Regex Patterns & Automata Theory in One Compact Slide Deck

Recently uploaded

Master 100 Regex Patterns & Automata Theory in One Compact Slide Deck