Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Advanced Regular Expressions in .NET

402 views

Published on

Not all regular expression engines are created equal. They all have differences and nuances. The .NET regex engine has some amazing advanced features that make it one of the most powerful implementations in existence today. Not all problems should be solved with regular expressions, but after this talk you will have to think harder to find one that can't.

Published in: Software
  • Be the first to comment

Advanced Regular Expressions in .NET

  1. 1. Advanced Regular Expressions in .NET Patrick Delancy
  2. 2. NOTICE!!! This slide deck has been adapted from a presentation that was intended to be given live, in person…. like with a real person in front of real people. You know… breathing the same air and all that. The key points have been transcribed onto separate slides, so you still get some benefit from reading through it all, but you are still missing out on all of the great stories, witty banter, hilarious costumes, stunning arias … or something like that. If you REALLY want to get the most out of this presentation, go to patrickdelancy.com and ask him to come give it to your group!
  3. 3. This presentation will help you understand what Regex is capable of.
  4. 4. Don’t bother trying to memorize the syntax, just remember the concepts.
  5. 5. Then you can make a more intelligent decision about when you should and should not use Regex.
  6. 6. Common Features ...but not ubiquitous ● Non-capturing groups ● Look ahead ● Look behind ● Free-spacing
  7. 7. Non-Capturing Groups ^(.*)(@)(.*)$ email@ddress.com email@ddress.com [1] = email [2] = @ [3] = ddress.com ^(.*)(?:@)(.*)$ email@ddress.com email@ddress.com [1] = email [2] = ddress.com
  8. 8. Look Ahead bw+(?=.) # match the word at end of each sentence # but don’t capture the period. See Dick. See Jane. See Dick and Jane run. Dick Jane run
  9. 9. Look Behind (?<=b19)d{2}b # match all years in the 1900’s # capturing only the 2-digit year 1842 1902 1776 1985 2003 1999 02 85 99
  10. 10. Free Spacing (Ignore Pattern Whitespace) new Regex( @” b[^@]+ # pattern can now span multiple lines @ [^b]+b # and include white space for readability ”, RegexOptions.IgnorePatternWhitespace);
  11. 11. Less-Common Features ...in more advanced engines ● Named Captures ● Comments ● Inline Directives ● Conditional Alternation ● Atomic Groups ● Compiled Patterns ● Unicode Categories and Named Character Blocks
  12. 12. Named Captures ^(?<name>.*)(?:@)(?<domain>.*)$ email@ddress.com email@ddress.com [name] = email [domain] = ddress.com
  13. 13. Comments ^.*@.*$ # comment to the end of the line ^.*@(?# this is an inline comment).*$
  14. 14. Inline Directives John the (?ix) (?: wiser | better and greater | privy ) John the Wiser, John the BetterAndGreater, john the privy, John the Better and Greater John the Wiser John the BetterAndGreater
  15. 15. ^Type:(?:(?<ssn>SSN)|(?<eid>EID)), ID:(?(ssn)d{3}-d{2}- d{4}|[-d]+)$ Type:SSN, ID:352-23-4567 Type:EID, ID:35-2234567 Type:SSN, ID:35-2234567 Type:EID, ID:??? Conditional Alternation
  16. 16. b(in|integer|insert)b integer integers in insert Atomic Grouping / Possessive Quantifiers b(?>in|integer|insert)b integer integers in insert
  17. 17. var pattern = new Regex(@”a+h+!+”); return pattern.IsMatch(value); Compiled Patterns var pattern = @”a+h+!+”; return Regex .IsMatch(pattern, value);
  18. 18. b(?:p{IsGreek}+s?)+p{Pd}s(?>p{IsBasicLatin}+s?)+ Κατα Μαθθαίον - The Gospel of Matthew Named Character Blocks & Unicode Groups
  19. 19. Unique Features ...in the .NET RegEx engine ● Balancing Groups ● Character Class Subtraction ● Explicit Capture Only
  20. 20. ^(?:[^{}]|(?<open>{)|(?<-open>}))*(?(open)(?!))$ { if (true) { return “A”; } else { return “B”; } } { if (true) { return “A”; } else { return “B”; } Balancing Groups
  21. 21. [0-9-[1-8]] 0123456789 [0-9-[1-8-[2-7]]] 0123456789 Character Class Subtraction [w-[aeiou]] Lazy dog, quick fox, blah, blah, blah.
  22. 22. ^(?<name>[^@+]+(+[^+]+)?)@ (?<domain>(w+).(com|net|org))$ e+mail@ddress.com e+mail@ddress.com [name] = e+mail [2] = +mail [domain] = ddress.com [4] = ddress [5] = com Explicit Capture Only (?n)^(?<name>[^@+]+(+[^+]+)?)@ (?<domain>(w+).(com|net|org))$ e+mail@ddress.com e+mail@ddress.com [name] = e+mail [domain] = ddress.com
  23. 23. Patrick Delancy patrickdelancy.com This Presentation: patrickdelancy.com/presentations/... @patrickdelancy linkedin.com/in/patrickdelancy google.com/+patrickdelancy
  24. 24. Some Additional Resources • https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines - This is a little outdated, but still a good overview of how Regex implementations vary. • https://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#SupportedNamedBlocks – Here is a reference of all of the named Unicode blocks that .NET supports in Regex. Linked here because I told you I would : ) • http://www.regular-expressions.info/refflavors.html - This is a very comprehensive reference for many common Regex engines. Some content may be out of date as new versions of each platform are released. • http://www.regexplanet.com/ - An online pattern tester. Not the best interface, but very capable and has some nice features.

×