APMG juni 2014 - Regular Expression

392 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
392
On SlideShare
0
From Embeds
0
Number of Embeds
129
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

APMG juni 2014 - Regular Expression

  1. 1. Regular expression (RE) - crash course door Daniel Genis, Byte Internet
  2. 2. Regex - What is a regular expression - A mini program accepts or rejects a string. - Can be used to parse data out of strings
  3. 3. Regex - Why regular expressions? PRO ● Parsing is fast O(n) + NFA generation time ○ NFA generation time is a one time penalty ○ For a RE of size m we can build an NFA at a cost of O(2^m) ● Useful for validating string input ● Can be used in all Programming languages ○ Even in MySQL or other databases. But please please don’t use RE in database queries :-) ● Useful for fetching/parsing data out of strings ● Very powerful tool. A real swiss army knife!
  4. 4. Regex - When to avoid RE? CONS ● Regexes are a mini programs in themselves ● They can become very complex ● Some people argue regexes should always be avoided ● They are not very human readable ● Not everyone is comfortable with RE ● DFA must be created/compiled initially
  5. 5. Regex - Getting a feel Two dummy examples ^aap?$ a()?p+p Real world example: DB_BACKUP_REGEX = "^[a-zA-Z0-9_-]+_((d|-)+_(d|- )+)_UTC.sql.gz$"
  6. 6. Regex - Semantic buildingblocks ‘.’ == Matches any character except a newline ‘^’ == Matches the start of the string ‘$’ == Matches the end of a string ‘*’ == Causes the resulting RE to match 0 or more repetitions ‘+’ == Causes the resulting RE to match 1 or more repetitions ‘?’ == Causes the resulting RE to match 0 or 1 repetitions
  7. 7. Regex - basics - Which string matches? Regex: ^a(ab)*b$ Strings: aaab aabb ab abbb aababb _aabb _ == whitespace aabb_
  8. 8. Regex - basics - ^ () * $ Regex: ^a(ab)*b$ Strings: aaab aabb ab abbb aababb _aabb _ == whitespace aabb_
  9. 9. Regex - basics - ^ () * $ Regex: ^a(ab)*b$ Strings: aaab aabb ab abbb aababb _aabb aabb_
  10. 10. Regex - basics - Which string matches? Regex: aa+b*b$ old regex: ^a(ab)*b$ Strings: aaab aabb ab abbb aababb _aabb _ == whitespace aabb_
  11. 11. Regex - basics - Which string matches? Regex: aa+b*b$ old regex: ^a(ab)*b$ Strings: aaab aabb ab abbb aababa _aabb aabb_
  12. 12. Regex - basics - Which string matches? Regex: aa+b*b$ old regex: ^a(ab)*b$ Strings: aaab aabb ab abbb aababa _aabb aabb_
  13. 13. Regex - some more buildingblocks [a-zA-Z0-9] == w Matches 1 character a-z or A-Z or 0-9. and is the same as w d == [0-9] Matches 1 number d{5} Matches 5 numbers
  14. 14. Regex - bad practical example import re data = “2014-06-04 20:00” # How do we parse this to integers? regex = “^(d{4})-(d{2})-(d{2}) (d{2}):(d{2})” regex2 = “(d+)-(d+)-(d+) (d+):(d+)” # Works too! re.findall(regex, data) # returns
  15. 15. Regex - regex DFA regex2 = “(d+)-(d+)-(d+) (d+):(d+)”
  16. 16. Regex - stuff we didn’t cover! :D Regex can get very very complicated. Just to give you some idea: - Lookahead assertion (?=...) Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'. For example: (Isaac (?=Asimov))|(Banaan) Will match ‘Isaac Asimov’ or ‘Banaan’
  17. 17. Regex - stuff we didn’t cover! :D - Greedy vs Non-Greedy ‘*’, ‘+’, ‘?’ are greedy quanitifiers. They will match as much as possible to obtain a match. Non greedy quanitfiers will match as little as possible to achieve a match. Adding a ‘?’ makes the above quantifiers non-greedy ‘*?’, ‘+?’, ‘??’ We’ll skip these 2 for now :-) - Positive lookbehind assertion
  18. 18. Greedy vs Non-greedy example string = abbb regex = ab+? matches = abbb regex = ab+ matches = abbb regex = ab+?$ matches = abbb
  19. 19. Vragen ?
  20. 20. Regex - Usefull tools! Regex -> NFA/DFA converter http://hackingoff.com/compilers/regular-expression-to-nfa-dfa Testing regexes yourself http://www.pythonregex.com/

×