Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Regular Expression (Regex) Fundamentals

1,214 views

Published on

You can find the Regular Expression (Regex) fundamentals with given examples that you can use in your daily works.

Published in: Software

Regular Expression (Regex) Fundamentals

  1. 1. REGEX Regular Expression Mesut Güneş www.testrisk.com
  2. 2. Regular expressions are patterns used to match character combinations in strings[1]. What it is?
  3. 3. 1956: Stephen Cole Kleene, Regular Language 1968: Ken Thompson, Pattern Matching, Text editor 1970: Bell Labs, in Unix 1980: Henry Spencer, PERL 1992: POSIX.2 (UNIX Shell), Many languages [2] History
  4. 4. Hi, I called Jon on Tuesday, March 25th at 7pm and expressed a concern about my slow times accessing www.cnn.com. He said he would fix it, but I never heard back. Can someone contact me at Kellie.Booth@if.com ASAP? What does Ctrl-F5 mean, by the way? Thanks Kellie Human Brain VS Text Processing
  5. 5. Hi, …., thanks I called … March 27th ww.blabla.com Patterns?
  6. 6. (Hi|Hello),w{1,}(Regards|Thanks) Is(verb|auxiliary)(*) Marchsd(st|nd|rd|th) www.w{1,}.(com|net|edu|…) Patterns?
  7. 7. /pattern/options Regex syntax
  8. 8. ^ $ . | { } [ ] ( ) * + ? Literal Characters (metacharacters)
  9. 9. provide a list of potential matching characters at a position in the search text Square Brackets 7[Pp][Mm]
  10. 10. more examples Square Brackets 7[Pp][Mm] [123456789][aApP][Mm] [1-9][aApP][Mm]
  11. 11. provide characters cannot enter to regex Non-Printable Characters n - Matches a new line; Windows rn t - Matches a tab character. b - Matches a backspace (when used between brackets) a - Matches the bell character. r - Matches a carriage return. f - Matches Form feed. v - Matches a vertical tab. Euro € - u20AC British pound £ - u00A3 Yen ¥ -u00A5 Dollar sign $ - $ or u0024 or x24 cX - Matches an ASCII control character, such as cC is Ctrl-C.
  12. 12. provide list of excludation Negation [^0-9A-F] [^a-zA-Z0-9_] negative of w (or W)
  13. 13. repetition of characters Curly Brakets {n} : “n” times. {n,} : At least “n” times, but no upper limit. {n,m} : Between “n” and “m” times.
  14. 14. repetition characters Quantifier Symbols Quantifier Matches Same as ? Match zero or one time {0,1} * Match zero or more times {0, } + Match one or more times {1, }
  15. 15. define the string boundaries Starting and Ending Pattern ^ : starting string, not inside [] $ : end of string
  16. 16. provides alternatives Alternation (x|y|z) (www|ftp) www.w{1,}.(net|com|org|edu)
  17. 17. (x|y|z) vs [xyx] Alternation (x|y|z) : can be used for string [xyz][a-A0-9] : one character or list of characters (Regex|ReGex) - Re[gG]ex
  18. 18. . Any single character
  19. 19. [abc] A single character: a, b, or c
  20. 20. [^abc] Any single character but a, b, or c
  21. 21. [a-z] Any single character in the range a-z
  22. 22. [a-zA-Z] Any single character in the range a-z or A-Z
  23. 23. ^ Start of line
  24. 24. $ End of line
  25. 25. A Start of string
  26. 26. z End of string
  27. 27. s Any whitespace character
  28. 28. S Any non-whitespace character
  29. 29. d Any digit
  30. 30. D Any non-digit
  31. 31. w Any word character (letter, number, underscore)
  32. 32. W Any non-word character
  33. 33. b Any word boundary character
  34. 34. (...) Capture everything enclosed
  35. 35. (a|b) a or b
  36. 36. i Case insensitive option.
  37. 37. x ignore whitespace in regex
  38. 38. (? (name) <pattern>) Grouping
  39. 39. (?: <pattern>) Non-Capturing Group
  40. 40. check if the pattern follows by another Look Ahead (?=<pattern>) : positive look ahead (?!<pattern>) : negative look ahead (?<city>w+)[, ]+(?= NJ|PA|DE)
  41. 41. check if the pattern precede by another Look Behind (?<=<pattern>) : positive look ahead (?<!<pattern>) : negative look ahead (?<="state":)[ ].*(?<state>PA|Pennsylvania)
  42. 42. EXAMPLES
  43. 43. ^(?!.*(?:<|>|&|’|"|%|;|-|+|(|)|s)).{6,20}$ password should be 6 to 20 characters length and not include the followings: < > & ’ ” % ; - + ( )
  44. 44. Let’s Dig-into Pattern English Rule Regex Pattern BEGINNING of the string ^ Start of NEGATIVE LOOKAHEAD (?! Multiple any word except newline, with QUANTIFIER .* Start of NON-CAPTURING group (?: Single CHARACTER with ALTERNATION <| More single CHARACTER with ALTERNATION >| &| ‘| “| %| ;| -| +| (| )| s Repetition with boundaries {6,20} END string $ ^(?!.*(?:<|>|&|'|"|%|;|-|+|(|)|s)).{6,20}$
  45. 45. ack '(?<="GET")[,]"/nike.*' unix shell Find all “GET” requests to “nike” in all .csv files: ~/Downloads ls *.csv | wc -l 109 ~/Downloads ack '(?<="GET")[,]"/nike.*' | wc -l 88 ~/Downloads cat web_3000:25.csv | grep '/nike.*' "GET","/arama/nike",7,0,140,665,101,3797,196168,0.09 "GET","/kampanya/arama/nike",8,0,270,678,229,2641,164205,0.11 "GET","/nike/295/morhipo-ozel",2,0,81,88,81,95,121609,0.03 "GET","/nike/markalar/503/32026/marka?fh=discount_rate_catalog01]
  46. 46. BDD - Cucumber
  47. 47. ^/(Questions|Sorular|‫$/*)پرسش‬ Thanks Reference: [1] https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions [2] https://en.wikipedia.org/wiki/Regular_expression [3] Regular Expression Succinctly, Syncfusion, by Joe Both [4] http://www.slideshare.net/adamlowe/regex-cards-powerpoint-format [5] https://regex101.com Mesut Güneş www.testrisk.com

×