2. Programming with Sikander : Python: Regular Expression
Regular expressions are a powerful tool
for various kinds of string manipulation.
They are a domain specific language
(DSL) that is present as a library in most
modern programming languages, not just
Python.
They are useful for two main tasks:
identify whether a pattern exists in a given
sequence of characters (string) or not.
performing substitutions in a string.
3. Programming with Sikander : Python: Regular Expression
Implemented in Python with the “re” module
import re
5. Programming with Sikander : Python: Regular Expression
re.match function can be used to determine
whether pattern matches at the beginning of
a string.
If it does, match returns an object
representing the match, if not, it returns
None.
re.match(pattern, sequence)
8. Programming with Sikander : Python: Regular Expression
The function re.search finds a match of a
pattern anywhere in the string.
9. Programming with Sikander : Python: Regular Expression
The search function returns an object with
several methods that give details about it.
These methods include
group which returns the string matched.
start and end which return the start and
ending positions of the first match
span which returns the start and end
positions of the first match as a tuple.
12. Programming with Sikander : Python: Regular Expression
The function re.findall returns a list of all
substrings that match a pattern.
13. Programming with Sikander : Python: Regular Expression
re.finditer(pattern, string)
Return an iterator yielding match objects over all
non-overlapping matches for the pattern in string.
The string is scanned left-to-right, and matches are
returned in the order found.
14. Programming with Sikander : Python: Regular Expression
Example
Description
Character
"[a-m]"
A set of characters
[]
"d"
Signals a special sequence (can also be used to
escape special characters)
"he..o"
Any character (except newline character)
.
"^hello"
Starts with
^
"world$"
Ends with
$
"aix*"
Zero or more occurrences
*
"aix+"
One or more occurrences
+
"al{2}"
Excactly the specified number of occurrences
{}
"falls|stays"
Either or
|
Capture and group
()
18. Programming with Sikander : Python: Regular Expression
Matches any decimal digit; this is equivalent to the class [0-9].
d
Matches any non-digit character; this is equivalent to the class [^0-9].
D
Matches any whitespace character; this is equivalent to the class [ tnr].
s
Matches any non-whitespace character; this is equivalent to the
class [^ tnr].
S
Matches any alphanumeric character; this is equivalent to the class [a-
zA-Z0-9_].
w
Matches any non-alphanumeric character; this is equivalent to the
class [^a-zA-Z0-9_].
W
19. Programming with Sikander : Python: Regular Expression
The expression d matches any digit [0-9]
The expression D matches any character that is
not a digit.
Given a String extract All the digits and non-digits.
Output
Input
Digits : 1 0 2 0 2 3
Non Digits : R V E C
1RV02EC023
Digits : 1 2 3 4
Non Digits : A B C D E F
ABCDE1234F
Digits : 2 0 0 0
Non Digits : R u p e e s
Rupees 2000
21. Programming with Sikander : Python: Regular Expression
• s matches any whitespace character [ tnr]
• S matches any non-white space character.
Given a String extract all spaces and Non
space characters.
23. Programming with Sikander : Python: Regular Expression
• The expression w will match any word
character.
• Word characters include alphanumeric
characters (a-z,A-Z, 0-9) and underscore(_)
• Given a string, extract all word and non-word
characters (remove all special characters)
25. Programming with Sikander : Python: Regular Expression
The ^ symbol matches the position at the
start of a string.
The $ symbol matches the position at the
end of a string.
29. Programming with Sikander : Python: Regular Expression
• You are given a list of phone numbers and you are
required to check whether they are valid mobile
numbers.
• A valid mobile number is a ten digit number starting
with a 7, 8 or 9.
30. Programming with Sikander : Python: Regular Expression
• Verify if the given PAN number is correct.
• PAN Number:
It’s a 10 letter string
First 5 characters are alphabets
Next 4 characters are digits
Last character is alphabet
34. Programming with Sikander : Python: Regular Expression
A group() expression returns one or more
subgroups of the match.
A groups() expression returns a tuple
containing all the subgroups of the match.
35. Programming with Sikander : Python: Regular Expression
Given an email-id seperate the username,
website and extension
36. Programming with Sikander : Python: Regular Expression
A groupdict() expression returns a dictionary
containing all the named subgroups of the match,
keyed by the subgroup name.
37. Programming with Sikander : Python: Regular Expression
re.sub (pattern, repl, string)
Returns the string obtained by replacing
the pattern in string by the replacement
repl.
38. Programming with Sikander : Python: Regular Expression
Bangalore is the capital of Karnataka.
The Silicon City of India is Bangalore.
Bangalore was called garden city because of its greenary.
Task: Replace all the occurance of Bangalore to Bengaluru.
39. Programming with Sikander : Python: Regular Expression
re.compile(pattern, flags=0)
Compile a regular expression pattern into a
regular expression object, which can be used for
matching using its match(), search() and other
methods.
It also helps to search a pattern again without
rewriting it.