Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Regular Expressions


Published on

Gepresenteerd door Sido Grond voor Exception Twente bij Demcon op 21 januari 2014.

Published in: Technology
  • Legitimate jobs paying $40/h Tap into the booming online job, industry and start working now! ★★★
    Are you sure you want to  Yes  No
    Your message goes here

Regular Expressions

  1. 1. Regular Expressions 21 Januari 2014 Sido Grond Exception Twente 1
  2. 2. Index • • • • • • • Introduction Applications Constructs Regular expressions in C# Demos Pros and cons Conclusions 2
  3. 3. Introduction • Regular Expression (RE or regex) is a text string that describes a search pattern • Some similarity to wildcards (e.g. *.txt) • Platform/language independent but some minor differences among them • Available in a.o. C#, Perl, Python, Java, Javascript, Visual Studio, Notepad++, Linux command line tools 3
  4. 4. Applications • Syntax highlighting • Find and replace – Visual Studio (as usual: different syntax) – Notepad++ • Text searching – Unix tools: grep, sed, find • Programming – Pattern matching, filtering, replacing 4
  5. 5. Constructs: single character Regex Strings that match Strings that don’t match a “abc”, “bad” “xyz” ^a “abc” “bad”, “xyz”, “^a” a$ “cba” “abc”, “bad”, “xyz” ^[a-z]$ “a”, “x” “aa”, “A” ^[a-z0-9+-]$ “s”, “3”, “+”, “-” “12”, “Q”, “a1” ^[^a-zA-Z]$ “5”, “+” “p”, “R”, “abc”, “15” ^[a-zA-Z] “b”, “C123”, “dd” “1cba”, “ f”(note space) [^a-zA-Z] “1cba”, “ f” “b”, “dd” ^.$ “m”, “3”, “%” “13”, “%a”, “xyz” . “%”, “e4”, “xyz”, “.” “” . “.” “%”, “e4”, “xyz” 5
  6. 6. Constructs: word groups • • • • • • d is shorthand for Digits [0-9] w is shorthand for Words [a-zA-Z0-9] s is shorthand for whiteSpace [ trn] D is shorthand for non-Digits [^0-9] W is shorthand for non-Words [^a-zA-Z0-9] S is shorthand for non-whiteSpace [^ trn] 6
  7. 7. Constructs: multiple character Regex Strings that match Strings that don’t match abc “abcde”, “rabco” “bac”, “ABC”, “bcde” ^abc “abc”, “abcde” “rabco”, “ABC”, “bcde” Ed “ME3”, “E85”, “EBE5” “ACME”, “E”, “E 8” Ed+$ “ME35”, “EBE5” “E85x”, “E3EB”, “E 8” foo|bar “foot”, “bart” “ooba” a(b|c)d “Tabd”, “acdc” “abcd”, “aed”, “ad” ^foo|bar “foo1”, “Harbar” “toofoo” ^(foo|bar) “foo1”, “bart” “toofoo”, “Harbar” 7
  8. 8. Constructs: quantifiers Regex Strings that match Strings that don’t match ^a?$ “”, “a” “aa”, “aaaaa” ^a*$ “a”, “aaaaa”, “” “aaab”, “c”, “ba” ^a+$ “a”, “aaaaa” “aaab”, “c”, “ba”, “” ^a{4}$ “aaaa” “a”, “aaaaa”, “” ^a{2,6}$ “aa”, “aaa”, “aaaaaa” “a”, “”, “aaaaaaa” (e|o){2} “koe”, “booom”, “veel” “kol”, “omo”, “vele” e|(o{2}) “nest”, “booom”, “koe” “kol” (a[0-9]*){2} “aa”, “ba45a”, “a3a0” “abacus”, “a3b8a0” 8
  9. 9. Constructs: advanced Regex String First match Greedy <.+> “This is a <B>first</B> test” “<B>first</B>” Lazy <.+?> “This is a <B>first</B> test” “<B>” Greedy/lazy repetition Regex Strings that match Groups ^(a|b)c(d)$ “acd”, “bcd” 0:”acd” 1:”a” 2:”d” ^(?:a|b)c(d)$ same as above 0:”acd” 1:”d” ^(?<name>a|b)c(d)$ same as above 0:”acd” 1:”d” name:”a” Grouping 9
  10. 10. Constructs: advanced Regex Strings that match Strings that don’t match ^([a-c])x1x1 “axaxa”, “bxbxbyyyy” “axaxb”, “bxaxc” <(b)><(i)>.*?</2></1> “<b><i>bla</i></b>” “<b><i>bla</b></i>” Backreferences: inside regex Regex Strings that match Replace pattern Result strings ^(var)(1|2)$ “var1”, “var2” 1iable “variable” ^(a|b)c(d|e) “acd”, “bcd” 2XXX1 “dXXXa”, “dXXXb” Backreferences: find and replace 10
  11. 11. Constructs: advanced Regex Strings that match Strings that don’t match ([a-b](?=x)) “blaax”, “bxa” “ab”, “bacx” ((?<=x)[a-b]) “yyxa”, “bxb” “ab”, “aax” ([a-b](?!x)) “bla”, “a”, “bxa” “bxc”, “ax” ((?<!x)[a-b]) “ral”, “dbx”, “bxa” “xa”, “lxb” Look ahead/behind 11
  12. 12. Regexes in C# • using System.Text.RegularExpressions • Regex reg = new Regex(“a(b|c)d”); – – – – – reg.IsMatch(“abd”); reg.Matches(“abdEacd”); reg.Groups(“abdEabdC”); reg.Split(“abdEabdC”); reg.Replace(“abdEabdC”, “X”); true 2 Matches 2 Groups per match 2 Strings (“E” and “C”) “XEXC” • Single/Multiline options: is linebreak(n) special character 12
  13. 13. Regexes in C# • Demo 13
  14. 14. Regexes in editors • Notepad++ 14
  15. 15. Regexes in Linux • grep, sed, find, vi 15
  16. 16. Pros and cons • Advantages – – – – – Very flexible Fast processing Language independent A lot of work in a single line of code Often simpler than ‘substring+indexes’ approach • Disadvantages – Hard to read, for example ‘?’ has three meanings depending on context – Hard to debug: no info given when no match – Compilation only at runtime – Typos are very easily made (e.g. forget escape character) 16
  17. 17. Conclusions “Some people, when confronted with a problem, think ‘I know, I'll use regular expressions.’ Now they have two problems.” Jamie Zawinski Don’t overuse it! 17
  18. 18. Conclusions • • • • Very handy tool for string matching and replacing Built-in support in most programming languages Support in/for multiple applications More info – – %28v=vs.110%29.aspx • Fun – – 18