2. What is Regular Expression or RegEx?
-A Regular Expression (RegEx) is a sequence of
characters that defines a search pattern.
-It describes a pattern of text
• can test whether a string matches the expr's
pattern
• can use a regex to search/replace characters in
a string
• very powerful, but tough to read
Blue highlights show the match results of the regular expression pattern /h[aeiou]+/g
(the letter h followed by one or more vowels)
3. Application
Usually, such patterns are used by string-searching
algorithms for "find" or "find and replace" operations on
strings, or for input validation.
Regexes are useful in a wide variety of text
processing tasks, and more generally string
processing, where the data need not be textual.
Common applications include data validation,
data scraping (especially web scraping), data
wrangling, simple parsing, the production of
syntax highlighting systems, and many other
tasks.
4. Programming Languages that supports RegEx
Most general-purpose programming languages support regex capabilities, either natively or via
libraries. Comprehensivesupport is included in:
• C
• C++
• Java
• JavaScript
• Perl
• PHP
• Python
• Rust
5. Basic Syntax: Delimiter
•All RegEx statements must begin and end with / . This is called delimiter
•/someString/
Example: confirm if the string contains the word “dog”
STRING: “The quick brown fox jumps over the lazy dog.”
PATTERN: /dog/
Note: In Python, regular expressions do not require delimiters to separate the regular expression pattern from the surrounding text.
6. Modifiers/Flags are used to
perform case-insensitive and
global searches
/someString/g
/someString/i
/someStringn
anotherStringLine /m
Basic Syntax: Modifiers/Flags
Modifier Description
g Perform a global match (find all matches
rather than stopping after the first match)
i Perform case-insensitive matching
m Perform multiline matching
Example: confirm if the string contains multiple word for “DoG”
STRING: “The quick brown dog jumps over the lazy dog.”
PATTERN: /dog/gi flags can be
combined:
7. Basic Syntax: Boolean OR
Example: confirm if the string contains word for “dog” or “cat”
STRING: “The quick brown fox jumps over the lazy cat.”
PATTERN: /dog|cat/
-Find any of the alternatives specified
| means OR
"abc|def|g" matches lines with "abc", "def", or "g"
There's no AND symbol.
8. Basic Syntax: Parenthesis
() are for grouping
/(Homer|Marge) Simpson/ matches lines containing "Homer Simpson" or "Marge Simpson"
let text = "01234567890123456789";
let pattern = /(0|5|7)/g;
Do a global search to find any of the specified
alternatives (0|5|7):
9. Brackets are used
to find a range of
characters and
either inside a
character sets
Basic Syntax: Brackets
Expression Description
[abc] Find any character between the brackets (character set)
[0-9] Find any character between the brackets (any digit)
(character range)
[A-Z] Find any character between the brackets (any uppercase
alphabet character) (character range)
[0-9a-z] Find any character between the brackets (any
alphanumeric character) (character range)
"[bcd]art" matches strings containing "bart", "cart", and "dart"
equivalent to "(b|c|d)art" but shorter
inside [ ], most modifier keys act as normal characters
"what[.!*?]*" matches "what", "what.", "what!", "what?**!"
Modifier keys like . ! * and ? Is discussed
in next few slides
10. Basic Syntax: Brackets
an initial ^ inside a character set negates it
"[^abcd]" matches any character other than a, b, c, or d
inside a character set, - must be escaped to be matched
"[+-.]?[0-9]+" matches optional +, . or -, followed by one digit
11. Basic Syntax: Escape sequence
• many characters must be escaped to match them: / $ . [ ] ( ) ^ * + ?
• ".n" matches lines containing ".n"
Bypass metacharacter or special characters as literal character:
Example:
• (
• )
• ?
• .
• etc…
12. Basic Syntax: Built-in character ranges
b Find a match at the beginning/end of a word, beginning like this: bHI, end like this: HIb
B Find a match, but not at the beginning/end of a word
d any digit; equivalent to [0-9]
D any non-digit; equivalent to [^0-9]
s any whitespace character; [ fnrtv...]
s any non-whitespace character
w any word character; [A-Za-z0-9_]
W any non-word character
13. Basic Syntax: Quantifiers
• * means 0 or more occurrences
"abc*" matches "ab", "abc", "abcc", "abccc", ...
"a(bc)*" matches "a", "abc", "abcbc", "abcbcbc", ...
"a.*a" matches "aa", "aba", "a8qa", "a!?_a", ...
• + means 1 or more occurrences
"a(bc)+" matches "abc", "abcbc", "abcbcbc", ...
"Goo+gle" matches "Google", "Gooogle", "Goooogle", ...
• ? means 0 or 1 occurrences
"Martina?" matches lines with "Martin" or "Martina"
"Dan(iel)?" matches lines with "Dan" or "Daniel“
14. Basic Syntax: Quantifiers
• ^ Matches the beginning of input. If the multiline flag is set to true, also matches
immediately after a line break character. For example, /^A/ does not match the "A" in
"an A", but does match the first "A" in "An A".
• x(?=n) A positive lookahead is a construct in regular expressions that allows you to
match a group of characters only if they are followed by another specific pattern.
Positive lookaheads are written using the syntax (?=pattern).
• x(?!y) Negative lookahead assertion: Matches "x" only if "x" is not followed by "y".
For example, /d+(?!.)/ matches a number only if it is not followed by a decimal
point. /d+(?!.)/.exec('3.141') matches "141" but not "3".
Can positive lookahead first argument be empty?
Yes, a positive lookahead can have an empty first argument.
When the first argument of a positive lookahead is empty, it matches any position in the string that is followed by the pattern specified in the lookahead. This can
be useful in cases where you want to ensure that a certain pattern occurs somewhere in the string, but you don't want to match that pattern.
15. Basic Syntax: Quantifiers
• {min,max} means between min and max occurrences
"a(bc){2,4}" matches "abcbc", "abcbcbc", or "abcbcbcbc"
• min or max may be omitted to specify any number
"{2,}" means 2 or more
"{,6}" means up to 6
"{3}" means exactly 3
16. JavaScript RegEx methods
exec() :tests for a match in a string.
If it finds a match, it returns a result array, otherwise it returns null.
test() :tests for a match in a string.
If it finds a match, it returns true, otherwise it returns false.
toString(): returns the string value of the regular expression.
18. Example: phone number validator in the format (123) 456-7890:
const phoneRegex = /^(d{3}) d{3}-d{4}$/;
function validatePhoneNumber(phoneNumber) {
return phoneRegex.test(phoneNumber);
}
19. Example: Validate a URL that starts with https:// or http://:
const urlRegex = /^https?://[w-]+(.[w-
]+)+[/#?]?.*$/;
function validateUrl(url) {
return urlRegex.test(url);
}
20. Example: Remove all non-alphanumeric characters from a string:
const str = "Hello, world!";
const alphanumericStr = str.replace(/[^a-zA-Z0-9]/g, '');
console.log(alphanumericStr); // Output: "Helloworld"
21. Example: Extract all email addresses from a string:
const emailRegex = /[^s@]+@[^s@]+.[^s@]+/g;
const str = "Contact us at info@example.com or
sales@example.com for more information.";
const emailList = str.match(emailRegex);
console.log(emailList); // Output: ["info@example.com",
"sales@example.com"]
22. Example: Validate a password that contains at least one uppercase letter, one
lowercase letter, and one digit, and is at least 8 characters long:
const passwordRegex = /^(?=.*d)(?=.*[a-z])(?=.*[A-Z]).{8,}$/;
function validatePassword(password) {
return passwordRegex.test(password);
}
console.log(validatePassword("Password123")); // Output: true
console.log(validatePassword("password")); // Output: false