SlideShare a Scribd company logo
1 of 31
Chapter-IV
Python Regular Expression:
What is a regular expression?
• The regular expressions can be defined as the sequence of characters which are
used to search for a pattern in a string.
• A regular expression (or regex) is a pattern that describes a set of strings. It can
be used to search, edit, or manipulate text.
• Python has a built-in module called re that provides support for regular
expressions.
• The module re provides the support to use regex in the python program.
• The re module throws an exception if there is some error while using the regular
expression.
• The re module must be imported to use the regex functionalities in python.
• Syntax :
• import re
• To use regular expressions in Python, you first need to import the re module.
Then, you can use the re.match() or re.search() functions to search for a pattern
in a string.
Raw string
• In Python, the r prefix before a string denotes a "raw string." It is often
used when working with regular expressions to prevent backslashes  from
being treated as escape characters.
• When using regular expressions, backslashes are frequently used as escape
characters to represent special characters or character classes. However,
when working with raw strings, backslashes are treated as literal
characters, which can be helpful in simplifying regex patterns.
• pattern = r"d+" # Matches one or more digits
• In the above example, the r prefix allows the regular expression pattern to be written without
the need to escape the backslash (d instead of d), making it more readable and concise.
Regex Functions
• The following regex functions are used in the python.
SN Function Description
1 match
This method matches the regex pattern in the string with the optional
flag. It returns true if a match is found in the string otherwise it
returns false.
2 search
This method returns the match object if there is a match found in the
string.
3 findall It returns a list that contains all the matches of a pattern in the string.
4 split Returns a list in which the string has been split in each match.
5 sub Replace one or many matches in the string.
Example : Matching a specific word:
import re
text = "Hello, world!"
pattern = r"world"
matches = re.findall(pattern, text)
print(matches) # Output: ['world']
Example : Matching multiple options using
the pipe symbol
• import re
• text = "I like cats and dogs."
• pattern = r"cats|dogs"
• matches = re.findall(pattern, text)
• print(matches) # Output: ['cats', 'dogs']
Example: Matching digits using character classes:
import re
text = "I have 3 apples and 5 oranges."
pattern = r"d+" # d matches any digit, + matches one or more
occurrences
matches = re.findall(pattern, text)
print(matches) # Output: ['3', '5']
Matching a specific pattern using a
combination of characters and modifiers
import re
text = "The color of the sky is blue."
pattern = r"colou?r" # ? makes the preceding 'u' optional
matches = re.findall(pattern, text)
print(matches) # Output: ['color']
Example
• Search the string to see if it starts with "The" and ends with "Spain":
• import re
txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
Forming a regular expression
• A regular expression can be formed by using the mix of meta-characters, special sequences, and
sets.
• Meta-Characters
• Metacharacter is a character with the specified meaning.
Metacharacter Description Example
[ ] It represents the set of characters. "[a-z]"
 It represents the special sequence. "r"
. It signals that any character is present at some specific place. "Ja.v."
^ It represents the pattern present at the beginning of the string. "^Java"
$ It represents the pattern present at the end of the string. "point"
* It represents zero or more occurrences of a pattern in the string. "hello*"
+ It represents one or more occurrences of a pattern in the string. "hello+"
{} The specified number of occurrences of a pattern the string. "java{2}"
| It represents either this or that character is present. "java|point"
() Capture and group
• Examples of Meta Characters.
Special Sequences
• Special sequences are the sequences containing  followed by one of the characters.
Character Description
A It returns a match if the specified characters are present at the beginning of the string.
b It returns a match if the specified characters are present at the beginning or the end of the string.
B It returns a match if the specified characters are present at the beginning of the string but not at the end.
d It returns a match if the string contains digits [0-9].
D It returns a match if the string doesn't contain the digits [0-9].
s It returns a match if the string contains any white space character.
S It returns a match if the string doesn't contain any white space character.
w It returns a match if the string contains any word characters.
W It returns a match if the string doesn't contain any word.
Z Returns a match if the specified characters are at the end of the string.
Sets
• A set is a group of characters given inside a pair of square brackets. It represents the special
meaning.
SN Set Description
1 [arn]
Returns a match if the string contains any of the specified characters in the
set.
2 [a-n] Returns a match if the string contains any of the characters between a to n.
3 [^arn] Returns a match if the string contains the characters except a, r, and n.
4 [0123] Returns a match if the string contains any of the specified digits.
5 [0-9] Returns a match if the string contains any digit between 0 and 9.
6 [0-5][0-9] Returns a match if the string contains any digit between 00 and 59.
10 [a-zA-Z]
Returns a match if the string contains any alphabet (lower-case or upper-
case).
The findall() function
• This method returns a list containing a list of all matches of a pattern within the
string. It returns the patterns in the order they are found. If there are no matches,
then an empty list is returned.
• Example
• import re
•
• str = "How are you. How is everything"
•
• matches = re.findall("How", str)
•
• print(matches)
•
• print(matches)
The search() Function
• The search() function searches the string for a match, and returns a Match object if there
is a match.
• If there is more than one match, only the first occurrence of the match will be returned:
• Example
• Search for the first white-space character in the string:
• import re
• txt = "The rain in Spain"
• x = re.search("s", txt)
• print("The first white-space character is located in position:", x.start())
Continued...
• If no matches are found, the value None is returned:
• Example
• Make a search that returns no match:
• import re
• txt = "The rain in Spain"
• x = re.search("Portugal", txt)
• print(x)
The split() Function
• The split() function returns a list where the string has been split at each match:
• Example
• Split at each white-space character:
• import re
• txt = "The rain in Spain"
• x = re.split("s", txt)
• print(x)
• Note : You can control the number of occurrences by specifying the maxsplit parameter:
Continued...
• Example
• Split the string only at the first occurrence:
• import re
• txt = "The rain in Spain"
• x = re.split("s", txt, 1)
• print(x)
The sub() Function
• The sub() function replaces the matches with the text of your choice:
• Example
• Replace every white-space character with the number 9:
• import re
• txt = "The rain in Spain"
• x = re.sub("s", "9", txt)
• print(x)
Continued...
• You can control the number of replacements by specifying the count
parameter:
• Example
• Replace the first 2 occurrences:
• import re
• txt = "The rain in Spain"
• x = re.sub("s", "9", txt, 2)
• print(x)
The match object
• The match object contains the information about the search and the output. If there is no match found, the None object is
returned.
• Example
• import re
•
• str = "How are you. How is everything"
•
• matches = re.search("How", str)
•
• print(type(matches))
•
• print(matches) #matches is the search object
• Output:
• <class '_sre.SRE_Match'>
• <_sre.SRE_Match object; span=(0, 3), match='How'>
Meta characters example
import re
# Matching a word starting
with 'cat' and followed by any
three letters
text = "I have a cat and a car
but not a caterpillar."
pattern = r"cat..."
matches = re.findall(pattern,
text)
print(matches) # Output: ['cat
an', 'cater']
# Matching a word starting with 'a' or 'b'
text = "The apple and the banana are fruits."
pattern = r"b[a|b]w+"
matches = re.findall(pattern, text)
print(matches) # Output: ['apple', 'and', 'banana']
# Matching a word followed by 'ing' or 'ed'
text = "He is running and has walked."
pattern = r"bw+(?:ing|ed)b"
matches = re.findall(pattern, text)
print(matches) # Output: ['running', 'walked']
The Match object methods
• There are the following methods associated with the Match object.
• span(): It returns the tuple containing the starting and end position of
the match.
• string(): It returns a string passed into the function.
• group(): The part of the string is returned where the match is found.
Example
• import re
• str = "How are you. How is everything"
• matches = re.search("How", str)
• print(matches.span())
• print(matches.group())
• print(matches.string)
• Output:
• (0, 3)
• How
• How are you. How is everything
IMP Examples
• Email Validation –
• Validating an email means that whether the input that the user made
corresponding to the email address field is as per the format in which we want.
• Suppose we as programmers set the email format to be
"first_name.last_name@company_name.com" and the user enters
"gupta.rohit@csharpcorner.com".
• This input violates our condition. Some readers may think that how we decide
that which is the "first_name" and which is the "last_name". It is decided based
on the first name and last name entered by the user. In this condition, I assume
that the user enters "rohit" as first_name and "gupta" as last_name.
• The most common implementation of validation of an email address is found in
the mail servers where when you enter your email address it is checked whether
or not it follows a pre-defined format of that particular mail server.
Extracting email addresses from a string:
• import re
• text = "Contact us at info@example.com or support@example.com."
• pattern = r"b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,}b"
• matches = re.findall(pattern, text)
• print(matches) # Output: ['info@example.com',
'support@example.com']
Realtime parsing using regular expression
• import re
• def process_realtime_input(input_stream):
• pattern = r"b[A-Za-z]+b" # Matches words
• for match in re.finditer(pattern, input_stream):
• word = match.group(0)
• # Do something with the matched word in real-time
• print("Found word:", word)
• # Simulating real-time input
• input_stream = "Hello, how are you today? I hope you're doing well."
• # Process input in real-time
• process_realtime_input(input_stream)
Continued...
• Email Validation Using re
• Other Methods
•
• There are various Python packages and APIs available that are coded in a manner that you don't
have to code so much and in just 2 lines of code, you will be able to validate the given email
address.
•
• Below are some of the Email Validation Python packages:
• email-validator 1.0.5
• pylsEmail 1.3.2
• py3-validate-email
• Given below are some of the Email Validation APIs:
• Mailboxlayer
• Isitrealemail
• Sendgrid’s Email Validation API
• There are a lot of other Python packages and APIs which are both free as well as paid.
Password Validation
• Write a Python program to check the validity of a password (input from
users).
Validation :
• At least 1 letter between [a-z] and 1 letter between [A-Z].
• At least 1 number between [0-9].
• At least 1 character from [$#@].
• Minimum length 6 characters.
• Maximum length 16 characters.
• Example
URL Validation
• Given a URL as a character string str of size N.The task is to check if
the given URL is valid or not.
• Examples :
• Input : str = “https://www.google.com/”
• Output : Yes
• Input : str = “https:// www.google.org/”
• Output : No
import re
def is_valid_url(url):
pattern = r"^(https?|ftp)://[^s/$.?#].[^s]*$"
match = re.match(pattern, url)
return bool(match)
# Testing with example URLs
urls = [
"http://www.example.com",
"https://www.example.com",
"ftp://example.com",
"www.example.com",
"example.com",
"http://example.com/page",
"https://example.com/page?id=123",
"http://example.com/?query=hello"
]
for url in urls:
if is_valid_url(url):
print(f"{url} is a valid URL.")
else:
print(f"{url} is not a valid URL.")
Example : URL Validation

More Related Content

Similar to unit-4 regular expression.pptx

Similar to unit-4 regular expression.pptx (20)

Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in Python
 
Python - Regular Expressions
Python - Regular ExpressionsPython - Regular Expressions
Python - Regular Expressions
 
Python Strings Methods
Python Strings MethodsPython Strings Methods
Python Strings Methods
 
UNIT 4 python.pptx
UNIT 4 python.pptxUNIT 4 python.pptx
UNIT 4 python.pptx
 
Python (regular expression)
Python (regular expression)Python (regular expression)
Python (regular expression)
 
Python data handling
Python data handlingPython data handling
Python data handling
 
Python strings
Python stringsPython strings
Python strings
 
Python regular expressions
Python regular expressionsPython regular expressions
Python regular expressions
 
Java: Regular Expression
Java: Regular ExpressionJava: Regular Expression
Java: Regular Expression
 
Strings in Python
Strings in PythonStrings in Python
Strings in Python
 
Python revision tour II
Python revision tour IIPython revision tour II
Python revision tour II
 
Strings in python
Strings in pythonStrings in python
Strings in python
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP Strings
 
Java căn bản - Chapter9
Java căn bản - Chapter9Java căn bản - Chapter9
Java căn bản - Chapter9
 
Chapter 9 - Characters and Strings
Chapter 9 - Characters and StringsChapter 9 - Characters and Strings
Chapter 9 - Characters and Strings
 
16 Java Regex
16 Java Regex16 Java Regex
16 Java Regex
 
PHP Web Programming
PHP Web ProgrammingPHP Web Programming
PHP Web Programming
 
Python programming : Strings
Python programming : StringsPython programming : Strings
Python programming : Strings
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Python Datatypes by SujithKumar
Python Datatypes by SujithKumarPython Datatypes by SujithKumar
Python Datatypes by SujithKumar
 

Recently uploaded

Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 

Recently uploaded (20)

Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 

unit-4 regular expression.pptx

  • 2. What is a regular expression? • The regular expressions can be defined as the sequence of characters which are used to search for a pattern in a string. • A regular expression (or regex) is a pattern that describes a set of strings. It can be used to search, edit, or manipulate text. • Python has a built-in module called re that provides support for regular expressions. • The module re provides the support to use regex in the python program. • The re module throws an exception if there is some error while using the regular expression. • The re module must be imported to use the regex functionalities in python. • Syntax : • import re • To use regular expressions in Python, you first need to import the re module. Then, you can use the re.match() or re.search() functions to search for a pattern in a string.
  • 3. Raw string • In Python, the r prefix before a string denotes a "raw string." It is often used when working with regular expressions to prevent backslashes from being treated as escape characters. • When using regular expressions, backslashes are frequently used as escape characters to represent special characters or character classes. However, when working with raw strings, backslashes are treated as literal characters, which can be helpful in simplifying regex patterns. • pattern = r"d+" # Matches one or more digits • In the above example, the r prefix allows the regular expression pattern to be written without the need to escape the backslash (d instead of d), making it more readable and concise.
  • 4. Regex Functions • The following regex functions are used in the python. SN Function Description 1 match This method matches the regex pattern in the string with the optional flag. It returns true if a match is found in the string otherwise it returns false. 2 search This method returns the match object if there is a match found in the string. 3 findall It returns a list that contains all the matches of a pattern in the string. 4 split Returns a list in which the string has been split in each match. 5 sub Replace one or many matches in the string.
  • 5. Example : Matching a specific word: import re text = "Hello, world!" pattern = r"world" matches = re.findall(pattern, text) print(matches) # Output: ['world']
  • 6. Example : Matching multiple options using the pipe symbol • import re • text = "I like cats and dogs." • pattern = r"cats|dogs" • matches = re.findall(pattern, text) • print(matches) # Output: ['cats', 'dogs']
  • 7. Example: Matching digits using character classes: import re text = "I have 3 apples and 5 oranges." pattern = r"d+" # d matches any digit, + matches one or more occurrences matches = re.findall(pattern, text) print(matches) # Output: ['3', '5']
  • 8. Matching a specific pattern using a combination of characters and modifiers import re text = "The color of the sky is blue." pattern = r"colou?r" # ? makes the preceding 'u' optional matches = re.findall(pattern, text) print(matches) # Output: ['color']
  • 9. Example • Search the string to see if it starts with "The" and ends with "Spain": • import re txt = "The rain in Spain" x = re.search("^The.*Spain$", txt)
  • 10. Forming a regular expression • A regular expression can be formed by using the mix of meta-characters, special sequences, and sets. • Meta-Characters • Metacharacter is a character with the specified meaning. Metacharacter Description Example [ ] It represents the set of characters. "[a-z]" It represents the special sequence. "r" . It signals that any character is present at some specific place. "Ja.v." ^ It represents the pattern present at the beginning of the string. "^Java" $ It represents the pattern present at the end of the string. "point" * It represents zero or more occurrences of a pattern in the string. "hello*" + It represents one or more occurrences of a pattern in the string. "hello+" {} The specified number of occurrences of a pattern the string. "java{2}" | It represents either this or that character is present. "java|point" () Capture and group
  • 11. • Examples of Meta Characters.
  • 12. Special Sequences • Special sequences are the sequences containing followed by one of the characters. Character Description A It returns a match if the specified characters are present at the beginning of the string. b It returns a match if the specified characters are present at the beginning or the end of the string. B It returns a match if the specified characters are present at the beginning of the string but not at the end. d It returns a match if the string contains digits [0-9]. D It returns a match if the string doesn't contain the digits [0-9]. s It returns a match if the string contains any white space character. S It returns a match if the string doesn't contain any white space character. w It returns a match if the string contains any word characters. W It returns a match if the string doesn't contain any word. Z Returns a match if the specified characters are at the end of the string.
  • 13. Sets • A set is a group of characters given inside a pair of square brackets. It represents the special meaning. SN Set Description 1 [arn] Returns a match if the string contains any of the specified characters in the set. 2 [a-n] Returns a match if the string contains any of the characters between a to n. 3 [^arn] Returns a match if the string contains the characters except a, r, and n. 4 [0123] Returns a match if the string contains any of the specified digits. 5 [0-9] Returns a match if the string contains any digit between 0 and 9. 6 [0-5][0-9] Returns a match if the string contains any digit between 00 and 59. 10 [a-zA-Z] Returns a match if the string contains any alphabet (lower-case or upper- case).
  • 14. The findall() function • This method returns a list containing a list of all matches of a pattern within the string. It returns the patterns in the order they are found. If there are no matches, then an empty list is returned. • Example • import re • • str = "How are you. How is everything" • • matches = re.findall("How", str) • • print(matches) • • print(matches)
  • 15. The search() Function • The search() function searches the string for a match, and returns a Match object if there is a match. • If there is more than one match, only the first occurrence of the match will be returned: • Example • Search for the first white-space character in the string: • import re • txt = "The rain in Spain" • x = re.search("s", txt) • print("The first white-space character is located in position:", x.start())
  • 16. Continued... • If no matches are found, the value None is returned: • Example • Make a search that returns no match: • import re • txt = "The rain in Spain" • x = re.search("Portugal", txt) • print(x)
  • 17. The split() Function • The split() function returns a list where the string has been split at each match: • Example • Split at each white-space character: • import re • txt = "The rain in Spain" • x = re.split("s", txt) • print(x) • Note : You can control the number of occurrences by specifying the maxsplit parameter:
  • 18. Continued... • Example • Split the string only at the first occurrence: • import re • txt = "The rain in Spain" • x = re.split("s", txt, 1) • print(x)
  • 19. The sub() Function • The sub() function replaces the matches with the text of your choice: • Example • Replace every white-space character with the number 9: • import re • txt = "The rain in Spain" • x = re.sub("s", "9", txt) • print(x)
  • 20. Continued... • You can control the number of replacements by specifying the count parameter: • Example • Replace the first 2 occurrences: • import re • txt = "The rain in Spain" • x = re.sub("s", "9", txt, 2) • print(x)
  • 21. The match object • The match object contains the information about the search and the output. If there is no match found, the None object is returned. • Example • import re • • str = "How are you. How is everything" • • matches = re.search("How", str) • • print(type(matches)) • • print(matches) #matches is the search object • Output: • <class '_sre.SRE_Match'> • <_sre.SRE_Match object; span=(0, 3), match='How'>
  • 22. Meta characters example import re # Matching a word starting with 'cat' and followed by any three letters text = "I have a cat and a car but not a caterpillar." pattern = r"cat..." matches = re.findall(pattern, text) print(matches) # Output: ['cat an', 'cater'] # Matching a word starting with 'a' or 'b' text = "The apple and the banana are fruits." pattern = r"b[a|b]w+" matches = re.findall(pattern, text) print(matches) # Output: ['apple', 'and', 'banana'] # Matching a word followed by 'ing' or 'ed' text = "He is running and has walked." pattern = r"bw+(?:ing|ed)b" matches = re.findall(pattern, text) print(matches) # Output: ['running', 'walked']
  • 23. The Match object methods • There are the following methods associated with the Match object. • span(): It returns the tuple containing the starting and end position of the match. • string(): It returns a string passed into the function. • group(): The part of the string is returned where the match is found.
  • 24. Example • import re • str = "How are you. How is everything" • matches = re.search("How", str) • print(matches.span()) • print(matches.group()) • print(matches.string) • Output: • (0, 3) • How • How are you. How is everything
  • 25. IMP Examples • Email Validation – • Validating an email means that whether the input that the user made corresponding to the email address field is as per the format in which we want. • Suppose we as programmers set the email format to be "first_name.last_name@company_name.com" and the user enters "gupta.rohit@csharpcorner.com". • This input violates our condition. Some readers may think that how we decide that which is the "first_name" and which is the "last_name". It is decided based on the first name and last name entered by the user. In this condition, I assume that the user enters "rohit" as first_name and "gupta" as last_name. • The most common implementation of validation of an email address is found in the mail servers where when you enter your email address it is checked whether or not it follows a pre-defined format of that particular mail server.
  • 26. Extracting email addresses from a string: • import re • text = "Contact us at info@example.com or support@example.com." • pattern = r"b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,}b" • matches = re.findall(pattern, text) • print(matches) # Output: ['info@example.com', 'support@example.com']
  • 27. Realtime parsing using regular expression • import re • def process_realtime_input(input_stream): • pattern = r"b[A-Za-z]+b" # Matches words • for match in re.finditer(pattern, input_stream): • word = match.group(0) • # Do something with the matched word in real-time • print("Found word:", word) • # Simulating real-time input • input_stream = "Hello, how are you today? I hope you're doing well." • # Process input in real-time • process_realtime_input(input_stream)
  • 28. Continued... • Email Validation Using re • Other Methods • • There are various Python packages and APIs available that are coded in a manner that you don't have to code so much and in just 2 lines of code, you will be able to validate the given email address. • • Below are some of the Email Validation Python packages: • email-validator 1.0.5 • pylsEmail 1.3.2 • py3-validate-email • Given below are some of the Email Validation APIs: • Mailboxlayer • Isitrealemail • Sendgrid’s Email Validation API • There are a lot of other Python packages and APIs which are both free as well as paid.
  • 29. Password Validation • Write a Python program to check the validity of a password (input from users). Validation : • At least 1 letter between [a-z] and 1 letter between [A-Z]. • At least 1 number between [0-9]. • At least 1 character from [$#@]. • Minimum length 6 characters. • Maximum length 16 characters. • Example
  • 30. URL Validation • Given a URL as a character string str of size N.The task is to check if the given URL is valid or not. • Examples : • Input : str = “https://www.google.com/” • Output : Yes • Input : str = “https:// www.google.org/” • Output : No
  • 31. import re def is_valid_url(url): pattern = r"^(https?|ftp)://[^s/$.?#].[^s]*$" match = re.match(pattern, url) return bool(match) # Testing with example URLs urls = [ "http://www.example.com", "https://www.example.com", "ftp://example.com", "www.example.com", "example.com", "http://example.com/page", "https://example.com/page?id=123", "http://example.com/?query=hello" ] for url in urls: if is_valid_url(url): print(f"{url} is a valid URL.") else: print(f"{url} is not a valid URL.") Example : URL Validation