Regular Expressions and You

  • 583 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
583
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Regular Expressions and YouAn introduction to regular expressions.James I. ArmesWeb Developer, AllPlayers.com@jamesiarmes
  • 2. Email Validation Examples ^[w.%+-]+@[w.-]+.[A-Za-z]{2,4}$
  • 3. Email Validation Examples(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*))*)?;s*)
  • 4. Types of Regular Expressions● Simple Regular Expressions● POSIX Basic Regular Expressions● POSIX Extended Regular Expressions● Perl Regular Expressions
  • 5. Simple Regular Expressions● Traditional regular expressions.● Not a standard.● Support by some applications for backwards compatibility.● Deprecated.
  • 6. POSIX Basic Regular Expressions● Created to provide a common standard for Unix tools.● Designed to be backwards compatible with traditional regular expressions.● Adopted as the default syntax of many Unix tools.● Some metacharacters require escaping.
  • 7. POSIX Extended Regular Expressions● Adds some new metacharacters.● Metacharacters do not require escaping.● Dropped support for back references (n).● Many Unix tools provide support with a command line argument (usually -E).
  • 8. Perl Regular Expressions● Adds lazy quantification, named capture groups and recursive patterns.● Adopted by many programming languages due to its power.● Requires non-alphanumeric delimiters around expression.● Other languages only implement a subset, so implementations vary.
  • 9. Syntax
  • 10. Basic Metacharacters. Match any single character.^ Matches beginning of a string.$ Matches end of a string.| Matches the expression before or after (think ||).
  • 11. Character Classes[] Match any characters within the group.[^ ] Match any characters NOT within the group.[n-m] Match a range of characters.Examples:[A-Za-z0-9][^G-Zg-z _]
  • 12. Shorthand Character Classess Any whitespace character such as space, tab and newlines. Same as [nrt ]w Any word character. Same as [A-Za-z0-9_]d Any digit character. Same as [0-9]S, W, D Negated version of the above. Can be used inside character classes but could be confusing.
  • 13. Quantifiers* Match the preceding expression 0 or more times.+ Match the preceding expression 1 or more times.? Match the preceding expression 0 or 1 time.{m,n} Match the preceding expression at least m times but no more than n times.{m,} Match the preceding expression at least m times with no maximum.{,n} Match the preceding expression no more than n times with no minimum.{n} Match the preceding expression exactly n times.
  • 14. Lazy QuantifiersStandard Quantifiers are greedy.Example:Many programming courses start with a "Hello World" example.That would be "Hallo Welt" in German."Hello .*"Many programming courses start with a "Hello World" example.That would be "Hallo Welt" in German.
  • 15. Lazy QuantifiersUse ? to make a quantifier lazy.Example:Many programming courses start with a "Hello World" example.That would be "Hallo Welt" in German."Hello .*?"Many programming courses start with a "Hello World" example.That would be "Hallo Welt" in German.
  • 16. Grouping() Group the expression and capture the text.(?: ) Group the expression but DO NOT capture the text.
  • 17. Backreferences1 through 9 reference previously captured text.Example:Many programming courses start with a "Hello World"example. Hello World examples are extremely simple,especially when they just output "Hello World.(|")Hello World(1)Many programming courses start with a "Hello World"example. Hello World examples are extremely simple,especially when they just output "Hello World.
  • 18. Word Boundariesb matches the position between a word character(w) and a non-word character (W).Example:Hello WorldobHello| World
  • 19. Word BoundariesB matches the position between two wordcharacters (ww).Example:Hello WorldoBHello Wo|rld
  • 20. Lookaheads(?= ) matches the position directly before theexpression is matched.Example:Hello World sounds better than "Hello Earth".Hello(?= World)Hello World sounds better than "Hello Earth".
  • 21. Lookbehinds(?<= ) matches the position directly after theexpression is matched.Example:Hello World sounds better than "Hello Earth".(?<=")HelloHello World sounds better than "Hello Earth".
  • 22. Lookaheads(?! ) matches the position directly before theexpression is NOT matched.Example:Hello World sounds better than "Hello Earth".Hello(?! World)Hello World sounds better than "Hello Earth".
  • 23. Lookbehinds(?<! ) matches the position directly after theexpression is NOT matched.Example:Hello World sounds better than "Hello Earth".(?<!")HelloHello World sounds better than "Hello Earth".
  • 24. Conditionals(?(condition)then|else)● condition must be a lookahead or a lookbehind.● If condition is matched, then must match for the expression to pass.● If condition is not matched, else must match for the expression to pass.
  • 25. ConditionalsExample:Hello World sounds better than "Hello Earth".Hello (?(?<=World)World|Earth)Hello World sounds better than "Hello Earth".Hello (?(?<=People)People|Earth)Hello World sounds better than "Hello Earth".
  • 26. Modifiersi Case insensitive matching.s . matches newline characters.m ^ and $ match after and before newlines (respectively).x Whitespace within the expression is ignored unless escaped.g Match globally.
  • 27. Modifiers● (?a) to turn modifiers on.●(?-a) to turn modifiers off.Examples:(?i)WORLD(?-i)(?i-s)WORLD.(?s-i)(?i:WORLD)
  • 28. LanguageImplementations
  • 29. JavaScript● RegExp object. – var expression = new RegExp(World, g); – var expression = /World/g;● String.match()● String.replace()● String.split()
  • 30. Perl● if ($string =~ /regex/)● $string =~ s/regex/replacement/● Regexp::Common – http://search.cpan.org/dist/Regexp-Common/ – Provides common expressions. – Examples: ● IP Address ● Credit Card Number ● Profanity
  • 31. PHP● ereg vs. preg – preg uses Perl syntax. – ereg uses POSIX Extended syntax. – preg is much faster. – ereg has been deprecated as of PHP 5.3.
  • 32. PHP● preg_match()● preg_match_all()● preg_replace()● preg_split()● preg_quote()● http://www.php.net/manual/en/book.pcre.php● http://php.net/manual/reference.pcre.pattern.modifiers.php
  • 33. Tools and Resources● txt2regex - http://aurelio.net/txt2regex/● Reggy (mac) - http://reggyapp.com/● Patterns (mac) - http://krillapps.com/patterns/● Web based - http://regex.larsolavtorvik.com/● Regular-Expressions.info (reference) - http://www.regular-expressions.info/
  • 34. Thanks!http://xkcd.com/208/