Regular Expressions

  • 233 views
Uploaded on

Gepresenteerd door Sido Grond voor Exception Twente bij Demcon op 21 januari 2014.

Gepresenteerd door Sido Grond voor Exception Twente bij Demcon op 21 januari 2014.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
233
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Regular Expressions 21 Januari 2014 Sido Grond Exception Twente 1
  • 2. Index • • • • • • • Introduction Applications Constructs Regular expressions in C# Demos Pros and cons Conclusions 2
  • 3. Introduction • Regular Expression (RE or regex) is a text string that describes a search pattern • Some similarity to wildcards (e.g. *.txt) • Platform/language independent but some minor differences among them • Available in a.o. C#, Perl, Python, Java, Javascript, Visual Studio, Notepad++, Linux command line tools 3
  • 4. Applications • Syntax highlighting • Find and replace – Visual Studio (as usual: different syntax) – Notepad++ • Text searching – Unix tools: grep, sed, find • Programming – Pattern matching, filtering, replacing 4
  • 5. Constructs: single character Regex Strings that match Strings that don’t match a “abc”, “bad” “xyz” ^a “abc” “bad”, “xyz”, “^a” a$ “cba” “abc”, “bad”, “xyz” ^[a-z]$ “a”, “x” “aa”, “A” ^[a-z0-9+-]$ “s”, “3”, “+”, “-” “12”, “Q”, “a1” ^[^a-zA-Z]$ “5”, “+” “p”, “R”, “abc”, “15” ^[a-zA-Z] “b”, “C123”, “dd” “1cba”, “ f”(note space) [^a-zA-Z] “1cba”, “ f” “b”, “dd” ^.$ “m”, “3”, “%” “13”, “%a”, “xyz” . “%”, “e4”, “xyz”, “.” “” . “.” “%”, “e4”, “xyz” 5
  • 6. Constructs: word groups • • • • • • d is shorthand for Digits [0-9] w is shorthand for Words [a-zA-Z0-9] s is shorthand for whiteSpace [ trn] D is shorthand for non-Digits [^0-9] W is shorthand for non-Words [^a-zA-Z0-9] S is shorthand for non-whiteSpace [^ trn] 6
  • 7. Constructs: multiple character Regex Strings that match Strings that don’t match abc “abcde”, “rabco” “bac”, “ABC”, “bcde” ^abc “abc”, “abcde” “rabco”, “ABC”, “bcde” Ed “ME3”, “E85”, “EBE5” “ACME”, “E”, “E 8” Ed+$ “ME35”, “EBE5” “E85x”, “E3EB”, “E 8” foo|bar “foot”, “bart” “ooba” a(b|c)d “Tabd”, “acdc” “abcd”, “aed”, “ad” ^foo|bar “foo1”, “Harbar” “toofoo” ^(foo|bar) “foo1”, “bart” “toofoo”, “Harbar” 7
  • 8. Constructs: quantifiers Regex Strings that match Strings that don’t match ^a?$ “”, “a” “aa”, “aaaaa” ^a*$ “a”, “aaaaa”, “” “aaab”, “c”, “ba” ^a+$ “a”, “aaaaa” “aaab”, “c”, “ba”, “” ^a{4}$ “aaaa” “a”, “aaaaa”, “” ^a{2,6}$ “aa”, “aaa”, “aaaaaa” “a”, “”, “aaaaaaa” (e|o){2} “koe”, “booom”, “veel” “kol”, “omo”, “vele” e|(o{2}) “nest”, “booom”, “koe” “kol” (a[0-9]*){2} “aa”, “ba45a”, “a3a0” “abacus”, “a3b8a0” 8
  • 9. Constructs: advanced Regex String First match Greedy <.+> “This is a <B>first</B> test” “<B>first</B>” Lazy <.+?> “This is a <B>first</B> test” “<B>” Greedy/lazy repetition Regex Strings that match Groups ^(a|b)c(d)$ “acd”, “bcd” 0:”acd” 1:”a” 2:”d” ^(?:a|b)c(d)$ same as above 0:”acd” 1:”d” ^(?<name>a|b)c(d)$ same as above 0:”acd” 1:”d” name:”a” Grouping 9
  • 10. Constructs: advanced Regex Strings that match Strings that don’t match ^([a-c])x1x1 “axaxa”, “bxbxbyyyy” “axaxb”, “bxaxc” <(b)><(i)>.*?</2></1> “<b><i>bla</i></b>” “<b><i>bla</b></i>” Backreferences: inside regex Regex Strings that match Replace pattern Result strings ^(var)(1|2)$ “var1”, “var2” 1iable “variable” ^(a|b)c(d|e) “acd”, “bcd” 2XXX1 “dXXXa”, “dXXXb” Backreferences: find and replace 10
  • 11. Constructs: advanced Regex Strings that match Strings that don’t match ([a-b](?=x)) “blaax”, “bxa” “ab”, “bacx” ((?<=x)[a-b]) “yyxa”, “bxb” “ab”, “aax” ([a-b](?!x)) “bla”, “a”, “bxa” “bxc”, “ax” ((?<!x)[a-b]) “ral”, “dbx”, “bxa” “xa”, “lxb” Look ahead/behind 11
  • 12. Regexes in C# • using System.Text.RegularExpressions • Regex reg = new Regex(“a(b|c)d”); – – – – – reg.IsMatch(“abd”); reg.Matches(“abdEacd”); reg.Groups(“abdEabdC”); reg.Split(“abdEabdC”); reg.Replace(“abdEabdC”, “X”); true 2 Matches 2 Groups per match 2 Strings (“E” and “C”) “XEXC” • Single/Multiline options: is linebreak(n) special character 12
  • 13. Regexes in C# • Demo 13
  • 14. Regexes in editors • Notepad++ 14
  • 15. Regexes in Linux • grep, sed, find, vi 15
  • 16. Pros and cons • Advantages – – – – – Very flexible Fast processing Language independent A lot of work in a single line of code Often simpler than ‘substring+indexes’ approach • Disadvantages – Hard to read, for example ‘?’ has three meanings depending on context – Hard to debug: no info given when no match – Compilation only at runtime – Typos are very easily made (e.g. forget escape character) 16
  • 17. Conclusions “Some people, when confronted with a problem, think ‘I know, I'll use regular expressions.’ Now they have two problems.” Jamie Zawinski Don’t overuse it! 17
  • 18. Conclusions • • • • Very handy tool for string matching and replacing Built-in support in most programming languages Support in/for multiple applications More info – http://www.regular-expressions.info/ – http://msdn.microsoft.com/en-us/library/az24scfc %28v=vs.110%29.aspx • Fun – http://regex.alf.nu/ – http://www.i-programmer.info/news/144-graphics-and-games/5450can-you-do-the-regular-expression-crossword.html 18