Upcoming SlideShare
×

# Python & Perl: Compiling Regular Expressions to Finite State Automata; Matching Regular Expressions; Quantification of Regular Expressions; Python Dictionaries & Perl Hashes

1,398 views

Published on

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
1,398
On SlideShare
0
From Embeds
0
Number of Embeds
296
Actions
Shares
0
0
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Python & Perl: Compiling Regular Expressions to Finite State Automata; Matching Regular Expressions; Quantification of Regular Expressions; Python Dictionaries & Perl Hashes

1. 1. Python & Perl Compiling Regular Expressions to Finite State Machines, Matching Regular Expressions, Quantification of Regular Expressions, Python Dictionaries & Perl Hashes Vladimir Kulyukinwww.youtube.com/vkedco www.vkedco.blogspot.com
2. 2. Outline ● Review ● Compiling Regular Expressions into Finite State Automata ● Matching Regular Expressions ● Quantification of Regular Expressions ● Python Dictionaries ● Perl Hasheswww.youtube.com/vkedco www.vkedco.blogspot.com
4. 4. Review: Languages • A language is a set of strings over an alphabet • Σ* is the Kleene closure of Σ and denotes the set of all strings over Σ including ε • Examples: – If Σ = {a}, then Σ* = {a}* = {ε, a, aa, aaa, aaaa, …} – If Σ = {0,1}, then Σ* = {0,1}* is an infinite set that includes all strings of 0’s and 1’s and εwww.youtube.com/vkedco www.vkedco.blogspot.com
5. 5. Review: Deterministic Finite Automata • A DFA can be informally defined as a directed graph whose nodes are states and whose edges are transitions on specific symbols • A DFA has a unique start state and a set (possibly empty) of final or accepting states • A DFA processes the input string one symbol at a time. When the last symbol is read, the DFA reaches a state which is either final or not. If the state is final, the DFA accepts (recognizes) the string. If the state is not final, the DFA rejects the stringwww.youtube.com/vkedco www.vkedco.blogspot.com
6. 6. Review: NFA vs. DFA ● NFAs are simpler to write, because, in general, have fewer states and allow for spontaneous transitions ● However, they are not more powerful than DFAs, i.e. they accept the same regular languages as DFAs ● For every NFA, one can construct a DFA that accepts the same languagewww.youtube.com/vkedco www.vkedco.blogspot.com
7. 7. Review: Regular Expressions ● Regular expressions are programmatic equivalents of finite state automata ● Regular expressions are compiled into finite state machines either at run time or at compile time ● Regular expressions are often referred to as patternswww.youtube.com/vkedco www.vkedco.blogspot.com
8. 8. Review: Operator m//: Syntax ● The match operator m is followed by a regular expression, aka a pattern, inside two matching delimiters: m/<regexp>/ ● If there is some text txt where we need to find some matches for a regular expression, we can do it as follows: txt =~ m/regexp/ ● =~ is a binding operator, it binds txt on the left to the regular expression of the match operatorwww.youtube.com/vkedco www.vkedco.blogspot.com
9. 9. Compiling Regular Expressions to Non-Deterministic Finite Automatawww.youtube.com/vkedco www.vkedco.blogspot.com
10. 10. Three Language Operations If L1 and L2 are the languages, L1 L2 = { x1 x 2 | x ∈ L1 and x 2 ∈ L2 }. If L is a language, the Kleene closure of L is L* = { x1 x 2 ...x n | n ≥ 0, x0≤i ≤n ∈ L}. If L is a language, then is L+ = { x1 x 2 ...x n | n ≥ 1, x1≤i ≤n ∈ L}.www.youtube.com/vkedco www.vkedco.blogspot.com
11. 11. Atomic & Compound Regular Expressions ● Regular expressions can be divided into atomic and compound ● Atomic regular expressions are basic building blocks out of which compound regular expres- sions are built ● There are typically three atomic regular expres- sions: unit strings (strings of one symbol), empty strings (strings of no symbol), and the empty set of stringswww.youtube.com/vkedco www.vkedco.blogspot.com
12. 12. Atomic Regular Expressions A regular expression is a string r that denotes the language L( r ) over some alphabet Σ . Three types of atomic regular expressions : 1. If a ∈ Σ, then L( a ) = { a} 2. L ( ε ) = { ε } 3. L( ∅) = { }www.youtube.com/vkedco www.vkedco.blogspot.com
13. 13. Compound Regular Expressions Let r1 and r2 be regular expressions. 1. ( r1 + r2 ) is a regular expression. Then L( r1 + r2 ) = L( r1 ) ∪ L( r2 ). 2. ( r1 r2 ) is a regular expression. Then L( r1 r2 ) = L( r1 ) L( r2 ). 3. ( r ) is a regular expression. Then L ( r ) * ( ) = ( L( r ) ) . * *www.youtube.com/vkedco www.vkedco.blogspot.com
14. 14. Examples Regular expression = ab; L( ab ) = { ab}. Regular expression = ab + c; L( ab + c ) = { ab, c}. Regular expression = ba ; L ba * ( * ) = {b, ba, baa, baaa,...} = {ba | n ≥ 0}. nwww.youtube.com/vkedco www.vkedco.blogspot.com
15. 15. Examples Regular expression | Language ( a + b) * ( L ( a + b) * ) = { a, b} . * ( ab + ε ) L( ( ab + ε ) ) = { ab, ε }. ( a + b )( c + d ) { ac, ad , bc, bd }. ( abc ) * ( L ( abc ) * ) = {ε , abc, abcabc, abcabcabc,...}. * a b * ( La b * * ) = {ε , a, b, aa, bb, abb, aaab,...}.www.youtube.com/vkedco www.vkedco.blogspot.com
16. 16. Atomic Reg Exps → NFAs a ∈Σ a This NFA accepts only the string a and nothing elsewww.youtube.com/vkedco www.vkedco.blogspot.com
17. 17. Atomic Reg Exps → NFAs ε ε ε This NFA accepts only the empty stringwww.youtube.com/vkedco www.vkedco.blogspot.com
18. 18. Atomic Reg Exps → NFAs ∅ This NFA accepts only the empty set, i.e., no stringswww.youtube.com/vkedco www.vkedco.blogspot.com
19. 19. Compound Reg Exps → NFAs ( r1 + r2 ) , where r1 , r2 are regular expressions Another notation, commonly used in regexp engines, is (r1 | r2), in other words, either r1 or r2 r1 ε ε ε ε r2www.youtube.com/vkedco www.vkedco.blogspot.com
20. 20. Compound Reg Exps → NFAs This compound NFA accepts if and only if either the NFA for r1 (upper one) accepts or the NFA for r2 (lower one) accepts r1 ε ε ε ε r2www.youtube.com/vkedco www.vkedco.blogspot.com
21. 21. Compound Reg Exps → NFAs ( r1 ) + , where r1 is regular expression ε ε r1 ε This regular expressions accepts strings that match r1 at least oncewww.youtube.com/vkedco www.vkedco.blogspot.com
22. 22. Compound Reg Exps → NFAs ( r1 ) * ε ε r1 ε ε This regular expressions accepts strings that match r1 zero or more timeswww.youtube.com/vkedco www.vkedco.blogspot.com
23. 23. Matching Regular Expressionswww.youtube.com/vkedco www.vkedco.blogspot.com
24. 24. Special Characters & Character Classes ● Special characters match all characters in a character class ● Character ranges match all characters in a specified character range ● Special characters are typically escaped in strings that specify regular expressionswww.youtube.com/vkedco www.vkedco.blogspot.com
25. 25. Perls Special Characters ● d – any digit character ● D – any non-digit character ● w – any word character (alphanumeric or underscore) ● W – any non-word character ● s – any whitespace character (space, tab, carriage return, newline, form feed) ● S – any non-whitespace characterwww.youtube.com/vkedco www.vkedco.blogspot.com
26. 26. Matching Special Characters ## does \$txt contain any digits? \$txt =~ /d/;www.youtube.com/vkedco www.vkedco.blogspot.com
27. 27. Matching Special Characters ## there is a match, because  ## ab cd 1h g contains  ## digit character 1.  ab cd 1h g =~ /d/;www.youtube.com/vkedco www.vkedco.blogspot.com
28. 28. Matching Special Characters ## no match, because ab cd ef g  ## does not contain any digit  ## characters.  ab cd ef g =~ /d/;www.youtube.com/vkedco www.vkedco.blogspot.com
29. 29. Matching Special Characters ## no match, because ## 0100100101 contains no ## non­digit characters.  0100100101 =~ /D/; www.youtube.com/vkedco www.vkedco.blogspot.com
30. 30. Matching Special Characters ## there is a match, because abc  ## contains a space character. abc  =~ /s/; ## there is a match, because  ## "abcn" contains n "abcn" =~ /s/;www.youtube.com/vkedco www.vkedco.blogspot.com
31. 31. Matching Special Characters ## there is no match, because abc  ## contains no whitespace characters abc =~ /s/; www.youtube.com/vkedco www.vkedco.blogspot.com
32. 32. Matching Character Ranges ## there is a match because ab 01 cd  ## contains characters in the range  ## [0­9] ab 01 cd =~ /[0­9]/); ## there is no match because abcdef  ## does not contain characters in [0­9]. abcdef =~ /[0­9]/; www.youtube.com/vkedco www.vkedco.blogspot.com
33. 33. Perls Special Characters ● d – any digit character ● D – any non-digit character ● w – any word character (alphanumeric or underscore) ● W – any non-word character ● s – any whitespace character (space, tab, carriage return, newline, form feed) ● S – any non-whitespace characterwww.youtube.com/vkedco www.vkedco.blogspot.com
34. 34. Matching Alternations ## /ab|cd/ ­ match ab or cd ## there is a match because of cd 12 cd =~ /ab|cd/; ## /ab|cd / ­ match ab or cd  ## there is no match 12 cd =~ /ab|cd /; ## /ab|cd| ef/ ­ match ab or cd or  ef ## /ab|cd| ef/ matches 12 cd 12 cd =~ /ab|cd| ef/ ## /ab|cd |ef/ ­ match ab or cd  or ef ## there is a match. 12 cd  =~ /ab|cd |ef/www.youtube.com/vkedco www.vkedco.blogspot.com
35. 35. Quantification of Regular Expressionswww.youtube.com/vkedco www.vkedco.blogspot.com
36. 36. Quantifiers ● Regular expression specification allows special symbols (called quantifiers) that enable specific regular expres- sions to match more than one instance at a time ● The typical quantifiers are: – * – + – ? – {n} – {n, m}www.youtube.com/vkedco www.vkedco.blogspot.com
37. 37. Quantified Patterns ● pattern*: any number of occurrences of pattern ● pattern+: one or more occurrences of pattern ● pattern?: zero or one occurrence of pattern ● pattern{n}: exactly n occurrences of pattern ● pattern{n,m}: from n to m occurrences of pat- tern, inclusivelywww.youtube.com/vkedco www.vkedco.blogspot.com
38. 38. Matching Quantified Patterns my \$txt_01 = abcd efg; my \$txt_02 = abcd efg 123; ## there is a match because abcd efg  ## has 0 digits. abcd efg =~ /d*/; ## there is a match because abcd efg  ## 123 has 0 or more digits. abcd efg 123 =~ /d*/;www.youtube.com/vkedco www.vkedco.blogspot.com
39. 39. Matching Quantified Patterns ## /d+/ ­ match one or more digit characters ## there is no match in abcd efg, because the string  ## does not have any digit characters. abcd efg =~ /d+/; ##  there  is  a  match  in  abcd  efg  123,  because  the  string ## does contain digit characters. abcd efg 123 =~ /d+/;www.youtube.com/vkedco www.vkedco.blogspot.com
40. 40. Matching Quantified Patterns ## /(0d*0)|(1d*1)/ ­ match any sequence ## of digits that starts and ends with  ## 0 or any sequence of digits that  ## starts and ends with 1. my \$pat = (0d*0)|(1d*1); ## a match 11 =~ /\$pat/; ## a match 00 =~ /\$pat/;www.youtube.com/vkedco www.vkedco.blogspot.com
41. 41. Matching Quantified Patterns my \$pat = (0d*0)|(1d*1); ## no match 1a1 =~ /\$pat/; ## no match 0123b0 =~ /\$pat/; ## yes match 0a10230 =~ /\$pat/; ## matches 0120, 01234000, 1001231, 012340 foreach (0120, 01234000, 1001231, 1a231, 012340) {    print \$_, "n" if \$_ =~ /\$pat/; }www.youtube.com/vkedco www.vkedco.blogspot.com
42. 42. Matching Quantified Patterns ## re? ­ zero or 1 occurrence of re my \$pat_02 = ad?a; ## yes match aa =~ /\$pat_02/; ## yes match a1a =~ /\$pat_02/; ## no match a12a =~ /\$pat_02/;www.youtube.com/vkedco www.vkedco.blogspot.com
43. 43. Matching Quantified Patterns ## /d{3} – match exactly 3 digits my \$pat_03 = d{3}; ## there is a match 123 =~ /\$pat_03/; ## there is no match 1a1a4 =~ /\$pat_03/;www.youtube.com/vkedco www.vkedco.blogspot.com
44. 44. Matching Quantified Patterns my \$pat_04 = d{2,4}; ## match 2, 3, or 4 consecutive digits ## match 11 aa =~ /\$pat_04/; ## match 111 aa =~ /\$pat_04/; ## match 1111 aa =~ /\$pat_04/; ## match 1111111 aa =~ /\$pat_04/; ## no match 1 aa =~ /\$pat_04/;www.youtube.com/vkedco www.vkedco.blogspot.com
46. 46. Dictionary ● Dictionary is a set of key-value pairs ● Dictionaries are mutable ● Dictionary is the only type built-in mapping data structure ● The keys are unorderedwww.youtube.com/vkedco www.vkedco.blogspot.com
47. 47. Basic Syntax ● Dictionary is defined by { } emptyDict = {} ## defines empty dictionary ● A key-value pair is defined as key colon value dict = {one : 1} ● Multiple key-value pairs are separated by commas dict = {one : 1, two : 2, three : 3}www.youtube.com/vkedco www.vkedco.blogspot.com
48. 48. Keys and Values ● Keys can be any immutable objects: numbers, characters, tuples, and strings ● Lists and dictionaries cannot be keys ● Values can be any objects ● Dictionaries can be nested, i.e., there can be a dictionary within another dictionarywww.youtube.com/vkedco www.vkedco.blogspot.com
49. 49. Example box = {size : {height : 10, width : 20}, isCardboard : True, color : red, contents : [nail, hammer, screw]}www.youtube.com/vkedco www.vkedco.blogspot.com
50. 50. Access >>> box[size] # must retrieve on key that exists {width: 20, height: 10} >>> box[size][width] 20 >>> box[contents] [nail, hammer, screw] >>> box[contents][-1] screwwww.youtube.com/vkedco www.vkedco.blogspot.com
51. 51. Access >>> box.hasKey(size) True >>> box.items() # do it only on small dictionaries [(color, red), (isCardboard, True), (contents, [nail, hammer, screw]), (size, {width: 20, height: 10})] >>> box.items()[0] (color, red) >>> box.items()[0][-1] redwww.youtube.com/vkedco www.vkedco.blogspot.com
52. 52. Adding Key-Value Pairs ● You can add new key value pairs to the previously created dictionary my_dict = {} for c in abcdefg: my_dict[c.upper()] = c >>> my_dict {A: a, C: c, B: b, E: e, D: d, G: g, F: f} >>> my_dict[(1, 2)] = [1, 2] >>> my_dict[(1, 2)] [1, 2]www.youtube.com/vkedco www.vkedco.blogspot.com
53. 53. Perl Hashes source code at hash_construction_01.pl, hash_construction_02.plwww.youtube.com/vkedco www.vkedco.blogspot.com
54. 54. Hashes ● A hash is a one-to-one mapping from keys to values ● Keys are not ordered ● A hash variable must be marked with the % type identifier ● Three main type identifiers: – \$ - scalar – @ - array – % - hashwww.youtube.com/vkedco www.vkedco.blogspot.com
55. 55. Hash Construction ● A hash can be constructed from a list of comma- separated key-value pairs ● A hash can be constructed by inserting key-value pairs into it ● A hash can be constructed from a list with the => operatorwww.youtube.com/vkedco www.vkedco.blogspot.com
56. 56. Hash Construction my @ary = (a .. e); ## here is a hash constructed from a list of ## key­value pairs using the => operator. ## whatever appears to the left of ## => is evaluated as if it was ## a double quoted string. Ranges and arrays ## should be placed inside a double­quoted ## string. The hash key to the left of ## => cannot have white space. my %tbl_01 = ( one   => 1            two   => 2+3,            three => "@ary",            four  => "this is value 4"     );www.youtube.com/vkedco www.vkedco.blogspot.com
57. 57. Hash Construction ## %tbl_01 is a hash constructed from a list of comma­separated key value pairs: ## one,  1 is the first key­value pair, ## two,  2 is the second key­value pair, ## three, 3 is the third key­value pair. ## numerical expressions in value places are evaluated; ## a non­numerical expression in a value place ## is evaluated as a double quoted string. my %tbl_01 = ( one  , 1,            two  , 2,            three, 3**2 + 10,            four, "(1 .. 4)",            five, This is value 5     );www.youtube.com/vkedco www.vkedco.blogspot.com
58. 58. Hash Construction ## %tbl_01 is a hash constructed from a list of comma­separated key value pairs: ## one,  1 is the first key­value pair, ## two,  2 is the second key­value pair, ## three, 3 is the third key­value pair. ## numerical expressions in value places are evaluated; ## a non­numerical expression in a value place ## is evaluated as a double quoted string. my %tbl_01 = ( one  , 1,            two  , 2,            three, 3**2 + 10,            four, "(1 .. 4)",            five, This is value 5     );www.youtube.com/vkedco www.vkedco.blogspot.com
59. 59. Matching Quantified Patterns ## /d{3} – match exactly 3 digits my \$pat_03 = d{3}; ## there is a match 123 =~ /\$pat_03/; ## there is no match 1a1a4 =~ /\$pat_03/;www.youtube.com/vkedco www.vkedco.blogspot.com
60. 60. References ● www.python.org ● http://docs.python.org/2/ ● www.perl.org ● http://perldoc.perl.org/www.youtube.com/vkedco www.vkedco.blogspot.com
61. 61. References ● Davis, Weyuker, Sigal. Ch. 9. Computability, Complexity, and Languages, 2nd Edition, Academic Press ● A. Brooks Weber. Ch. 2, 3. Formal Language: A Practical Introduction, Franklin, Beedle & Associates, Incwww.youtube.com/vkedco www.vkedco.blogspot.com