2. What is a regular expression?
• The regular expressions can be defined as the sequence of characters which are
used to search for a pattern in a string.
• A regular expression (or regex) is a pattern that describes a set of strings. It can
be used to search, edit, or manipulate text.
• Python has a built-in module called re that provides support for regular
expressions.
• The module re provides the support to use regex in the python program.
• The re module throws an exception if there is some error while using the regular
expression.
• The re module must be imported to use the regex functionalities in python.
• Syntax :
• import re
• To use regular expressions in Python, you first need to import the re module.
Then, you can use the re.match() or re.search() functions to search for a pattern
in a string.
3. Raw string
• In Python, the r prefix before a string denotes a "raw string." It is often
used when working with regular expressions to prevent backslashes from
being treated as escape characters.
• When using regular expressions, backslashes are frequently used as escape
characters to represent special characters or character classes. However,
when working with raw strings, backslashes are treated as literal
characters, which can be helpful in simplifying regex patterns.
• pattern = r"d+" # Matches one or more digits
• In the above example, the r prefix allows the regular expression pattern to be written without
the need to escape the backslash (d instead of d), making it more readable and concise.
4. Regex Functions
• The following regex functions are used in the python.
SN Function Description
1 match
This method matches the regex pattern in the string with the optional
flag. It returns true if a match is found in the string otherwise it
returns false.
2 search
This method returns the match object if there is a match found in the
string.
3 findall It returns a list that contains all the matches of a pattern in the string.
4 split Returns a list in which the string has been split in each match.
5 sub Replace one or many matches in the string.
5. Example : Matching a specific word:
import re
text = "Hello, world!"
pattern = r"world"
matches = re.findall(pattern, text)
print(matches) # Output: ['world']
6. Example : Matching multiple options using
the pipe symbol
• import re
• text = "I like cats and dogs."
• pattern = r"cats|dogs"
• matches = re.findall(pattern, text)
• print(matches) # Output: ['cats', 'dogs']
7. Example: Matching digits using character classes:
import re
text = "I have 3 apples and 5 oranges."
pattern = r"d+" # d matches any digit, + matches one or more
occurrences
matches = re.findall(pattern, text)
print(matches) # Output: ['3', '5']
8. Matching a specific pattern using a
combination of characters and modifiers
import re
text = "The color of the sky is blue."
pattern = r"colou?r" # ? makes the preceding 'u' optional
matches = re.findall(pattern, text)
print(matches) # Output: ['color']
9. Example
• Search the string to see if it starts with "The" and ends with "Spain":
• import re
txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
10. Forming a regular expression
• A regular expression can be formed by using the mix of meta-characters, special sequences, and
sets.
• Meta-Characters
• Metacharacter is a character with the specified meaning.
Metacharacter Description Example
[ ] It represents the set of characters. "[a-z]"
It represents the special sequence. "r"
. It signals that any character is present at some specific place. "Ja.v."
^ It represents the pattern present at the beginning of the string. "^Java"
$ It represents the pattern present at the end of the string. "point"
* It represents zero or more occurrences of a pattern in the string. "hello*"
+ It represents one or more occurrences of a pattern in the string. "hello+"
{} The specified number of occurrences of a pattern the string. "java{2}"
| It represents either this or that character is present. "java|point"
() Capture and group
12. Special Sequences
• Special sequences are the sequences containing followed by one of the characters.
Character Description
A It returns a match if the specified characters are present at the beginning of the string.
b It returns a match if the specified characters are present at the beginning or the end of the string.
B It returns a match if the specified characters are present at the beginning of the string but not at the end.
d It returns a match if the string contains digits [0-9].
D It returns a match if the string doesn't contain the digits [0-9].
s It returns a match if the string contains any white space character.
S It returns a match if the string doesn't contain any white space character.
w It returns a match if the string contains any word characters.
W It returns a match if the string doesn't contain any word.
Z Returns a match if the specified characters are at the end of the string.
13. Sets
• A set is a group of characters given inside a pair of square brackets. It represents the special
meaning.
SN Set Description
1 [arn]
Returns a match if the string contains any of the specified characters in the
set.
2 [a-n] Returns a match if the string contains any of the characters between a to n.
3 [^arn] Returns a match if the string contains the characters except a, r, and n.
4 [0123] Returns a match if the string contains any of the specified digits.
5 [0-9] Returns a match if the string contains any digit between 0 and 9.
6 [0-5][0-9] Returns a match if the string contains any digit between 00 and 59.
10 [a-zA-Z]
Returns a match if the string contains any alphabet (lower-case or upper-
case).
14. The findall() function
• This method returns a list containing a list of all matches of a pattern within the
string. It returns the patterns in the order they are found. If there are no matches,
then an empty list is returned.
• Example
• import re
•
• str = "How are you. How is everything"
•
• matches = re.findall("How", str)
•
• print(matches)
•
• print(matches)
15. The search() Function
• The search() function searches the string for a match, and returns a Match object if there
is a match.
• If there is more than one match, only the first occurrence of the match will be returned:
• Example
• Search for the first white-space character in the string:
• import re
• txt = "The rain in Spain"
• x = re.search("s", txt)
• print("The first white-space character is located in position:", x.start())
16. Continued...
• If no matches are found, the value None is returned:
• Example
• Make a search that returns no match:
• import re
• txt = "The rain in Spain"
• x = re.search("Portugal", txt)
• print(x)
17. The split() Function
• The split() function returns a list where the string has been split at each match:
• Example
• Split at each white-space character:
• import re
• txt = "The rain in Spain"
• x = re.split("s", txt)
• print(x)
• Note : You can control the number of occurrences by specifying the maxsplit parameter:
18. Continued...
• Example
• Split the string only at the first occurrence:
• import re
• txt = "The rain in Spain"
• x = re.split("s", txt, 1)
• print(x)
19. The sub() Function
• The sub() function replaces the matches with the text of your choice:
• Example
• Replace every white-space character with the number 9:
• import re
• txt = "The rain in Spain"
• x = re.sub("s", "9", txt)
• print(x)
20. Continued...
• You can control the number of replacements by specifying the count
parameter:
• Example
• Replace the first 2 occurrences:
• import re
• txt = "The rain in Spain"
• x = re.sub("s", "9", txt, 2)
• print(x)
21. The match object
• The match object contains the information about the search and the output. If there is no match found, the None object is
returned.
• Example
• import re
•
• str = "How are you. How is everything"
•
• matches = re.search("How", str)
•
• print(type(matches))
•
• print(matches) #matches is the search object
• Output:
• <class '_sre.SRE_Match'>
• <_sre.SRE_Match object; span=(0, 3), match='How'>
22. Meta characters example
import re
# Matching a word starting
with 'cat' and followed by any
three letters
text = "I have a cat and a car
but not a caterpillar."
pattern = r"cat..."
matches = re.findall(pattern,
text)
print(matches) # Output: ['cat
an', 'cater']
# Matching a word starting with 'a' or 'b'
text = "The apple and the banana are fruits."
pattern = r"b[a|b]w+"
matches = re.findall(pattern, text)
print(matches) # Output: ['apple', 'and', 'banana']
# Matching a word followed by 'ing' or 'ed'
text = "He is running and has walked."
pattern = r"bw+(?:ing|ed)b"
matches = re.findall(pattern, text)
print(matches) # Output: ['running', 'walked']
23. The Match object methods
• There are the following methods associated with the Match object.
• span(): It returns the tuple containing the starting and end position of
the match.
• string(): It returns a string passed into the function.
• group(): The part of the string is returned where the match is found.
24. Example
• import re
• str = "How are you. How is everything"
• matches = re.search("How", str)
• print(matches.span())
• print(matches.group())
• print(matches.string)
• Output:
• (0, 3)
• How
• How are you. How is everything
25. IMP Examples
• Email Validation –
• Validating an email means that whether the input that the user made
corresponding to the email address field is as per the format in which we want.
• Suppose we as programmers set the email format to be
"first_name.last_name@company_name.com" and the user enters
"gupta.rohit@csharpcorner.com".
• This input violates our condition. Some readers may think that how we decide
that which is the "first_name" and which is the "last_name". It is decided based
on the first name and last name entered by the user. In this condition, I assume
that the user enters "rohit" as first_name and "gupta" as last_name.
• The most common implementation of validation of an email address is found in
the mail servers where when you enter your email address it is checked whether
or not it follows a pre-defined format of that particular mail server.
26. Extracting email addresses from a string:
• import re
• text = "Contact us at info@example.com or support@example.com."
• pattern = r"b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,}b"
• matches = re.findall(pattern, text)
• print(matches) # Output: ['info@example.com',
'support@example.com']
27. Realtime parsing using regular expression
• import re
• def process_realtime_input(input_stream):
• pattern = r"b[A-Za-z]+b" # Matches words
• for match in re.finditer(pattern, input_stream):
• word = match.group(0)
• # Do something with the matched word in real-time
• print("Found word:", word)
• # Simulating real-time input
• input_stream = "Hello, how are you today? I hope you're doing well."
• # Process input in real-time
• process_realtime_input(input_stream)
28. Continued...
• Email Validation Using re
• Other Methods
•
• There are various Python packages and APIs available that are coded in a manner that you don't
have to code so much and in just 2 lines of code, you will be able to validate the given email
address.
•
• Below are some of the Email Validation Python packages:
• email-validator 1.0.5
• pylsEmail 1.3.2
• py3-validate-email
• Given below are some of the Email Validation APIs:
• Mailboxlayer
• Isitrealemail
• Sendgrid’s Email Validation API
• There are a lot of other Python packages and APIs which are both free as well as paid.
29. Password Validation
• Write a Python program to check the validity of a password (input from
users).
Validation :
• At least 1 letter between [a-z] and 1 letter between [A-Z].
• At least 1 number between [0-9].
• At least 1 character from [$#@].
• Minimum length 6 characters.
• Maximum length 16 characters.
• Example
30. URL Validation
• Given a URL as a character string str of size N.The task is to check if
the given URL is valid or not.
• Examples :
• Input : str = “https://www.google.com/”
• Output : Yes
• Input : str = “https:// www.google.org/”
• Output : No
31. import re
def is_valid_url(url):
pattern = r"^(https?|ftp)://[^s/$.?#].[^s]*$"
match = re.match(pattern, url)
return bool(match)
# Testing with example URLs
urls = [
"http://www.example.com",
"https://www.example.com",
"ftp://example.com",
"www.example.com",
"example.com",
"http://example.com/page",
"https://example.com/page?id=123",
"http://example.com/?query=hello"
]
for url in urls:
if is_valid_url(url):
print(f"{url} is a valid URL.")
else:
print(f"{url} is not a valid URL.")
Example : URL Validation