APMG juni 2014 - Regular Expression

  • 141 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
141
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Regular expression (RE) - crash course door Daniel Genis, Byte Internet
  • 2. Regex - What is a regular expression - A mini program accepts or rejects a string. - Can be used to parse data out of strings
  • 3. Regex - Why regular expressions? PRO ● Parsing is fast O(n) + NFA generation time ○ NFA generation time is a one time penalty ○ For a RE of size m we can build an NFA at a cost of O(2^m) ● Useful for validating string input ● Can be used in all Programming languages ○ Even in MySQL or other databases. But please please don’t use RE in database queries :-) ● Useful for fetching/parsing data out of strings ● Very powerful tool. A real swiss army knife!
  • 4. Regex - When to avoid RE? CONS ● Regexes are a mini programs in themselves ● They can become very complex ● Some people argue regexes should always be avoided ● They are not very human readable ● Not everyone is comfortable with RE ● DFA must be created/compiled initially
  • 5. Regex - Getting a feel Two dummy examples ^aap?$ a()?p+p Real world example: DB_BACKUP_REGEX = "^[a-zA-Z0-9_-]+_((d|-)+_(d|- )+)_UTC.sql.gz$"
  • 6. Regex - Semantic buildingblocks ‘.’ == Matches any character except a newline ‘^’ == Matches the start of the string ‘$’ == Matches the end of a string ‘*’ == Causes the resulting RE to match 0 or more repetitions ‘+’ == Causes the resulting RE to match 1 or more repetitions ‘?’ == Causes the resulting RE to match 0 or 1 repetitions
  • 7. Regex - basics - Which string matches? Regex: ^a(ab)*b$ Strings: aaab aabb ab abbb aababb _aabb _ == whitespace aabb_
  • 8. Regex - basics - ^ () * $ Regex: ^a(ab)*b$ Strings: aaab aabb ab abbb aababb _aabb _ == whitespace aabb_
  • 9. Regex - basics - ^ () * $ Regex: ^a(ab)*b$ Strings: aaab aabb ab abbb aababb _aabb aabb_
  • 10. Regex - basics - Which string matches? Regex: aa+b*b$ old regex: ^a(ab)*b$ Strings: aaab aabb ab abbb aababb _aabb _ == whitespace aabb_
  • 11. Regex - basics - Which string matches? Regex: aa+b*b$ old regex: ^a(ab)*b$ Strings: aaab aabb ab abbb aababa _aabb aabb_
  • 12. Regex - basics - Which string matches? Regex: aa+b*b$ old regex: ^a(ab)*b$ Strings: aaab aabb ab abbb aababa _aabb aabb_
  • 13. Regex - some more buildingblocks [a-zA-Z0-9] == w Matches 1 character a-z or A-Z or 0-9. and is the same as w d == [0-9] Matches 1 number d{5} Matches 5 numbers
  • 14. Regex - bad practical example import re data = “2014-06-04 20:00” # How do we parse this to integers? regex = “^(d{4})-(d{2})-(d{2}) (d{2}):(d{2})” regex2 = “(d+)-(d+)-(d+) (d+):(d+)” # Works too! re.findall(regex, data) # returns
  • 15. Regex - regex DFA regex2 = “(d+)-(d+)-(d+) (d+):(d+)”
  • 16. Regex - stuff we didn’t cover! :D Regex can get very very complicated. Just to give you some idea: - Lookahead assertion (?=...) Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'. For example: (Isaac (?=Asimov))|(Banaan) Will match ‘Isaac Asimov’ or ‘Banaan’
  • 17. Regex - stuff we didn’t cover! :D - Greedy vs Non-Greedy ‘*’, ‘+’, ‘?’ are greedy quanitifiers. They will match as much as possible to obtain a match. Non greedy quanitfiers will match as little as possible to achieve a match. Adding a ‘?’ makes the above quantifiers non-greedy ‘*?’, ‘+?’, ‘??’ We’ll skip these 2 for now :-) - Positive lookbehind assertion
  • 18. Greedy vs Non-greedy example string = abbb regex = ab+? matches = abbb regex = ab+ matches = abbb regex = ab+?$ matches = abbb
  • 19. Vragen ?
  • 20. Regex - Usefull tools! Regex -> NFA/DFA converter http://hackingoff.com/compilers/regular-expression-to-nfa-dfa Testing regexes yourself http://www.pythonregex.com/