Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Regular Expressions and You

978 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Regular Expressions and You

  1. 1. Regular Expressions and YouAn introduction to regular expressions.James I. ArmesWeb Developer, AllPlayers.com@jamesiarmes
  2. 2. Email Validation Examples ^[w.%+-]+@[w.-]+.[A-Za-z]{2,4}$
  3. 3. Email Validation Examples(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*))*)?;s*)
  4. 4. Types of Regular Expressions● Simple Regular Expressions● POSIX Basic Regular Expressions● POSIX Extended Regular Expressions● Perl Regular Expressions
  5. 5. Simple Regular Expressions● Traditional regular expressions.● Not a standard.● Support by some applications for backwards compatibility.● Deprecated.
  6. 6. POSIX Basic Regular Expressions● Created to provide a common standard for Unix tools.● Designed to be backwards compatible with traditional regular expressions.● Adopted as the default syntax of many Unix tools.● Some metacharacters require escaping.
  7. 7. POSIX Extended Regular Expressions● Adds some new metacharacters.● Metacharacters do not require escaping.● Dropped support for back references (n).● Many Unix tools provide support with a command line argument (usually -E).
  8. 8. Perl Regular Expressions● Adds lazy quantification, named capture groups and recursive patterns.● Adopted by many programming languages due to its power.● Requires non-alphanumeric delimiters around expression.● Other languages only implement a subset, so implementations vary.
  9. 9. Syntax
  10. 10. Basic Metacharacters. Match any single character.^ Matches beginning of a string.$ Matches end of a string.| Matches the expression before or after (think ||).
  11. 11. Character Classes[] Match any characters within the group.[^ ] Match any characters NOT within the group.[n-m] Match a range of characters.Examples:[A-Za-z0-9][^G-Zg-z _]
  12. 12. Shorthand Character Classess Any whitespace character such as space, tab and newlines. Same as [nrt ]w Any word character. Same as [A-Za-z0-9_]d Any digit character. Same as [0-9]S, W, D Negated version of the above. Can be used inside character classes but could be confusing.
  13. 13. Quantifiers* Match the preceding expression 0 or more times.+ Match the preceding expression 1 or more times.? Match the preceding expression 0 or 1 time.{m,n} Match the preceding expression at least m times but no more than n times.{m,} Match the preceding expression at least m times with no maximum.{,n} Match the preceding expression no more than n times with no minimum.{n} Match the preceding expression exactly n times.
  14. 14. Lazy QuantifiersStandard Quantifiers are greedy.Example:Many programming courses start with a "Hello World" example.That would be "Hallo Welt" in German."Hello .*"Many programming courses start with a "Hello World" example.That would be "Hallo Welt" in German.
  15. 15. Lazy QuantifiersUse ? to make a quantifier lazy.Example:Many programming courses start with a "Hello World" example.That would be "Hallo Welt" in German."Hello .*?"Many programming courses start with a "Hello World" example.That would be "Hallo Welt" in German.
  16. 16. Grouping() Group the expression and capture the text.(?: ) Group the expression but DO NOT capture the text.
  17. 17. Backreferences1 through 9 reference previously captured text.Example:Many programming courses start with a "Hello World"example. Hello World examples are extremely simple,especially when they just output "Hello World.(|")Hello World(1)Many programming courses start with a "Hello World"example. Hello World examples are extremely simple,especially when they just output "Hello World.
  18. 18. Word Boundariesb matches the position between a word character(w) and a non-word character (W).Example:Hello WorldobHello| World
  19. 19. Word BoundariesB matches the position between two wordcharacters (ww).Example:Hello WorldoBHello Wo|rld
  20. 20. Lookaheads(?= ) matches the position directly before theexpression is matched.Example:Hello World sounds better than "Hello Earth".Hello(?= World)Hello World sounds better than "Hello Earth".
  21. 21. Lookbehinds(?<= ) matches the position directly after theexpression is matched.Example:Hello World sounds better than "Hello Earth".(?<=")HelloHello World sounds better than "Hello Earth".
  22. 22. Lookaheads(?! ) matches the position directly before theexpression is NOT matched.Example:Hello World sounds better than "Hello Earth".Hello(?! World)Hello World sounds better than "Hello Earth".
  23. 23. Lookbehinds(?<! ) matches the position directly after theexpression is NOT matched.Example:Hello World sounds better than "Hello Earth".(?<!")HelloHello World sounds better than "Hello Earth".
  24. 24. Conditionals(?(condition)then|else)● condition must be a lookahead or a lookbehind.● If condition is matched, then must match for the expression to pass.● If condition is not matched, else must match for the expression to pass.
  25. 25. ConditionalsExample:Hello World sounds better than "Hello Earth".Hello (?(?<=World)World|Earth)Hello World sounds better than "Hello Earth".Hello (?(?<=People)People|Earth)Hello World sounds better than "Hello Earth".
  26. 26. Modifiersi Case insensitive matching.s . matches newline characters.m ^ and $ match after and before newlines (respectively).x Whitespace within the expression is ignored unless escaped.g Match globally.
  27. 27. Modifiers● (?a) to turn modifiers on.●(?-a) to turn modifiers off.Examples:(?i)WORLD(?-i)(?i-s)WORLD.(?s-i)(?i:WORLD)
  28. 28. LanguageImplementations
  29. 29. JavaScript● RegExp object. – var expression = new RegExp(World, g); – var expression = /World/g;● String.match()● String.replace()● String.split()
  30. 30. Perl● if ($string =~ /regex/)● $string =~ s/regex/replacement/● Regexp::Common – http://search.cpan.org/dist/Regexp-Common/ – Provides common expressions. – Examples: ● IP Address ● Credit Card Number ● Profanity
  31. 31. PHP● ereg vs. preg – preg uses Perl syntax. – ereg uses POSIX Extended syntax. – preg is much faster. – ereg has been deprecated as of PHP 5.3.
  32. 32. PHP● preg_match()● preg_match_all()● preg_replace()● preg_split()● preg_quote()● http://www.php.net/manual/en/book.pcre.php● http://php.net/manual/reference.pcre.pattern.modifiers.php
  33. 33. Tools and Resources● txt2regex - http://aurelio.net/txt2regex/● Reggy (mac) - http://reggyapp.com/● Patterns (mac) - http://krillapps.com/patterns/● Web based - http://regex.larsolavtorvik.com/● Regular-Expressions.info (reference) - http://www.regular-expressions.info/
  34. 34. Thanks!http://xkcd.com/208/

×