2. Regular Expressions
• A regular expression (abbreviated regex or regexp) is a
sequence of characters that forms a search pattern, mainly for
use in pattern matching with strings, or string matching, i.e.
"find and replace"-like operations.
• In Python, the regular expression are made available through re
module
Usage: import
• The functions in the re module checks if a particular string matches
a given regular expression.
3. Regular Expression : Characters
• Normal Characters
– Characters which match themselves
– Ex: test
• Metacharacters• Metacharacters
– Characters which has special meaning
The meta characters are :
. ^ $ * + ? { } [ ] | ( )
4. Regular Expressions: Metacharacters
• [ ]
– Used for specifying a character class, which is a set of characters that you
wish to match.
• Characters can be listed individually, or a range of characters can be
indicated by giving two characters and separating them by a '-'.indicated by giving two characters and separating them by a '-'.
• For example, [abc] will match any of the characters a, b, or c; this is the
same as [a-c], which uses a range to express the same set of characters.
• Few Metacharacters are not active inside classes. Example, [akm$] will
match any of the characters 'a', 'k', 'm', or '$';
– '$' is usually a metacharacter, but inside a character class it’s stripped of its special
nature.
5. Regular Expressions: MetaCharacters
• ^
– Complements the set inside a class
• We can match the characters not listed within the class by
complementing the set.complementing the set.
This is indicated by including a '^' as the first character of the
class;
• Example: [^5] will match any character except '5'.
6. •
– Removes the normal meaning
• Similar to String literals (escape sequence), allows to remove the
special meaning of the Meta characters.
Regular Expressions: Metacharacters
special meaning of the Meta characters.
• Example: To match a [ or , we can precede them with a
backslash to remove their special meaning: [ or .
• Few sequences beginning with have some predefined
meaning
7. Regular Expression - Metacharacters
• Sequences with special meaning with
d - Matches any decimal digit; this is equivalent to the class [0-9].
D - Matches any non-digit character; this is equivalent to the class
[^0-9].
s - Matches any whitespace character; this is equivalent to thes - Matches any whitespace character; this is equivalent to the
class [ tnr].
S - Matches any non-whitespace character; this is equivalent to
the class [^ tnr].
w - Matches any alphanumeric character; this is equivalent to the
class [a-zA-Z0-9_].
W - Matches any non-alphanumeric character; this is equivalent
to the class [^a-zA-Z0-9_]
8. • +
– Matches 1 or more occurrences of preceding expression.
– Ex: ca+t will match cat, caat, caaaaaaat
• *
– Matches 0 or more occurrences of preceding expression.
– Ex: ca*t will match ct, cat, caat, caaaaaaat
Regular Expression – Metacharacters for Repetition
Ex: ca*t will match ct, cat, caat, caaaaaaat
• ?
– Matches 0 or 1 occurrence of preceding expression.
– Ex: home-?brew matches either homebrew or home-brew.
• {m,n}
– m,n are decimal integers. m<= repetition <=n
– Ex: a/{1,3}b will match a/b, a//b, and a///b.
– Default Values of m and n are 0 and infinity
9. • .
– Matches anything except a newline character
• |
– Logical or
– If A and B are regular expressions, A|B will match any string that matches either A or B.
– | has very low precedence. Crow|Servo will match either Crow or Servo, not Cro, a 'w' or an 'S',
and ervo.
• ^
– Matches the beginning of the line
– print re.search('^From', 'From Here to Eternity')
Regular Expression - Metacharacters
– print re.search('^From', 'From Here to Eternity')
– print re.search('^From', 'Reciting From Memory')
• $
– Matches the end of the line
– print re.search('}$', '{block}')
– print re.search('}$', '{block} ')
– print re.search('}$', '{block}n')
• ()
– Grouping
– (ab)* means 0 or more occurrences of ab that is ababababababab
10. TASK
• Write a regular expression to match the following:
1. Lower case letters
2. Upper case letters
3. Only digits
4. All letters and digits4. All letters and digits
5. Anything else other than vowels
6. Python or python – There are 3 ways
7. Rube or Ruby
11. TASK
• For Repetation Cases:
1. Match "rub" or "ruby": the y is optional
2. Match "rub" plus 0 or more y’s
3. Match "rub" plus 1 or more y’s
4. Match exactly 3 digits
5. Match 3 or more digits
6. Match 3, 4, or 5 digits
• Pattern at the beginning or end of the string
1. Match hello at the beginning
2. Match world at the end of the string
12. Module Functions
• match(r, s, f=0)
– r : Pattern: This is the regular expression to be matched.
s: String: This is the string, which would be searched to match the pattern at the beginning of string.
f: Flags: You can specify different flags using bitwise OR (|). These are modifiers, which are listed in
the table below.
For now lets not worry about Flags
If r matches the start of string s, return a MatchObject , otherwise
return None
If r matches the start of string s, return a MatchObject , otherwise
return None
Functions of match object:
group() : Return the string matched by the RE
start() : Return the starting position of the match
end() : Return the ending position of the match
span() : Return a tuple containing the (start, end) positions of the match
We would use group(num) or groups() function of match object to get matched
expression.
13. Module Functions
• match() – Example
import re
line = "Cats are smarter than dogs“
matchObj = re.match( '(.*) are (.*)', line)
if matchObj:
print "matchObj.group() : ", matchObj.group()
print "matchObj.group(1) : ", matchObj.group(1)
print "matchObj.group(2) : ", matchObj.group(2)
print "matchObj.groups : ", matchObj.groups()
else:
print "No match!!"
14. Module Functions
• search(r, s, f=0)
– Similar to match().
TASK:TASK:
Try the previous program with the search().
15. Module Functions: Matching vs Searching
• The only difference: match checks for a match only at the beginning of
the string, while search checks for a match anywhere in the string
• Try this:
import re
line = "Cats are smarter than dogs";
matchObj = re.match( 'dogs', line, re.I)
if matchObj:
print "match --> matchObj.group() : ", matchObj.group()print "match --> matchObj.group() : ", matchObj.group()
else:
print "No match!!"
searchObj = re.search( 'dogs', line, re.M|re.I)
if searchObj:
print "search --> searchObj.group() : ", searchObj.group()
else:
print "Nothing found!!"
16. Module Function: Search and Replace
• sub(pattern, repl, string, max=0)
– This method replaces all occurrences of the RE pattern in string with repl,
substituting all occurrences unless max provided. This method would return
modified string.
– Example:
import re
phone = "2004-959-559 # This is Phone Number"
# Delete Python-style comments
num = re.sub('#.*$', "", phone)
print "Phone Num : ", num
# Remove anything other than digits
# Write the code for this
17. Optional Flags
• Optional flags may be included to control
various aspects of matching. Multiple
modifiers can be combined using a | (OR)
18. TASK
• Try:
import re
test = "Hi HOW are you"
pat = “how"pat = “how"
b = re.search(pat, test)
if b:
print "Yes"
print b;
• Now try to make it case-insensitive
19. TASK
• Lets try using the meta characters used in the
previous tasks in a program