Regular Expressions

1,138 views

Published on

Gepresenteerd door Sido Grond voor Exception Twente bij Demcon op 21 januari 2014.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,138
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Regular Expressions

  1. 1. Regular Expressions 21 Januari 2014 Sido Grond Exception Twente 1
  2. 2. Index • • • • • • • Introduction Applications Constructs Regular expressions in C# Demos Pros and cons Conclusions 2
  3. 3. Introduction • Regular Expression (RE or regex) is a text string that describes a search pattern • Some similarity to wildcards (e.g. *.txt) • Platform/language independent but some minor differences among them • Available in a.o. C#, Perl, Python, Java, Javascript, Visual Studio, Notepad++, Linux command line tools 3
  4. 4. Applications • Syntax highlighting • Find and replace – Visual Studio (as usual: different syntax) – Notepad++ • Text searching – Unix tools: grep, sed, find • Programming – Pattern matching, filtering, replacing 4
  5. 5. Constructs: single character Regex Strings that match Strings that don’t match a “abc”, “bad” “xyz” ^a “abc” “bad”, “xyz”, “^a” a$ “cba” “abc”, “bad”, “xyz” ^[a-z]$ “a”, “x” “aa”, “A” ^[a-z0-9+-]$ “s”, “3”, “+”, “-” “12”, “Q”, “a1” ^[^a-zA-Z]$ “5”, “+” “p”, “R”, “abc”, “15” ^[a-zA-Z] “b”, “C123”, “dd” “1cba”, “ f”(note space) [^a-zA-Z] “1cba”, “ f” “b”, “dd” ^.$ “m”, “3”, “%” “13”, “%a”, “xyz” . “%”, “e4”, “xyz”, “.” “” . “.” “%”, “e4”, “xyz” 5
  6. 6. Constructs: word groups • • • • • • d is shorthand for Digits [0-9] w is shorthand for Words [a-zA-Z0-9] s is shorthand for whiteSpace [ trn] D is shorthand for non-Digits [^0-9] W is shorthand for non-Words [^a-zA-Z0-9] S is shorthand for non-whiteSpace [^ trn] 6
  7. 7. Constructs: multiple character Regex Strings that match Strings that don’t match abc “abcde”, “rabco” “bac”, “ABC”, “bcde” ^abc “abc”, “abcde” “rabco”, “ABC”, “bcde” Ed “ME3”, “E85”, “EBE5” “ACME”, “E”, “E 8” Ed+$ “ME35”, “EBE5” “E85x”, “E3EB”, “E 8” foo|bar “foot”, “bart” “ooba” a(b|c)d “Tabd”, “acdc” “abcd”, “aed”, “ad” ^foo|bar “foo1”, “Harbar” “toofoo” ^(foo|bar) “foo1”, “bart” “toofoo”, “Harbar” 7
  8. 8. Constructs: quantifiers Regex Strings that match Strings that don’t match ^a?$ “”, “a” “aa”, “aaaaa” ^a*$ “a”, “aaaaa”, “” “aaab”, “c”, “ba” ^a+$ “a”, “aaaaa” “aaab”, “c”, “ba”, “” ^a{4}$ “aaaa” “a”, “aaaaa”, “” ^a{2,6}$ “aa”, “aaa”, “aaaaaa” “a”, “”, “aaaaaaa” (e|o){2} “koe”, “booom”, “veel” “kol”, “omo”, “vele” e|(o{2}) “nest”, “booom”, “koe” “kol” (a[0-9]*){2} “aa”, “ba45a”, “a3a0” “abacus”, “a3b8a0” 8
  9. 9. Constructs: advanced Regex String First match Greedy <.+> “This is a <B>first</B> test” “<B>first</B>” Lazy <.+?> “This is a <B>first</B> test” “<B>” Greedy/lazy repetition Regex Strings that match Groups ^(a|b)c(d)$ “acd”, “bcd” 0:”acd” 1:”a” 2:”d” ^(?:a|b)c(d)$ same as above 0:”acd” 1:”d” ^(?<name>a|b)c(d)$ same as above 0:”acd” 1:”d” name:”a” Grouping 9
  10. 10. Constructs: advanced Regex Strings that match Strings that don’t match ^([a-c])x1x1 “axaxa”, “bxbxbyyyy” “axaxb”, “bxaxc” <(b)><(i)>.*?</2></1> “<b><i>bla</i></b>” “<b><i>bla</b></i>” Backreferences: inside regex Regex Strings that match Replace pattern Result strings ^(var)(1|2)$ “var1”, “var2” 1iable “variable” ^(a|b)c(d|e) “acd”, “bcd” 2XXX1 “dXXXa”, “dXXXb” Backreferences: find and replace 10
  11. 11. Constructs: advanced Regex Strings that match Strings that don’t match ([a-b](?=x)) “blaax”, “bxa” “ab”, “bacx” ((?<=x)[a-b]) “yyxa”, “bxb” “ab”, “aax” ([a-b](?!x)) “bla”, “a”, “bxa” “bxc”, “ax” ((?<!x)[a-b]) “ral”, “dbx”, “bxa” “xa”, “lxb” Look ahead/behind 11
  12. 12. Regexes in C# • using System.Text.RegularExpressions • Regex reg = new Regex(“a(b|c)d”); – – – – – reg.IsMatch(“abd”); reg.Matches(“abdEacd”); reg.Groups(“abdEabdC”); reg.Split(“abdEabdC”); reg.Replace(“abdEabdC”, “X”); true 2 Matches 2 Groups per match 2 Strings (“E” and “C”) “XEXC” • Single/Multiline options: is linebreak(n) special character 12
  13. 13. Regexes in C# • Demo 13
  14. 14. Regexes in editors • Notepad++ 14
  15. 15. Regexes in Linux • grep, sed, find, vi 15
  16. 16. Pros and cons • Advantages – – – – – Very flexible Fast processing Language independent A lot of work in a single line of code Often simpler than ‘substring+indexes’ approach • Disadvantages – Hard to read, for example ‘?’ has three meanings depending on context – Hard to debug: no info given when no match – Compilation only at runtime – Typos are very easily made (e.g. forget escape character) 16
  17. 17. Conclusions “Some people, when confronted with a problem, think ‘I know, I'll use regular expressions.’ Now they have two problems.” Jamie Zawinski Don’t overuse it! 17
  18. 18. Conclusions • • • • Very handy tool for string matching and replacing Built-in support in most programming languages Support in/for multiple applications More info – http://www.regular-expressions.info/ – http://msdn.microsoft.com/en-us/library/az24scfc %28v=vs.110%29.aspx • Fun – http://regex.alf.nu/ – http://www.i-programmer.info/news/144-graphics-and-games/5450can-you-do-the-regular-expression-crossword.html 18

×