Regular Expression Crash Course


these are slides used to explain Regular Expression on my Channel link in Urdu / Hindi language and if you are interested kindly watch that video along with these slides.

  1. 1. REGULAR EXPRESSION IN ACTION Brief overview of Regular Expression building blocks and tools with a practical example
  3. 3. THE REGEX COACH IS A GRAPHICAL APPLICATION FOR WINDOWS WHICH CAN BE USED TO EXPERIMENT WITH REGULAR EXPRESSIONS INTERACTIVELY. ◦ Sublime Text is a text editor that has support of find and replace using Regular Expressions. Web based Regular Expressions tester. ◦
  4. 4. THE MOST BASIC REGULAR EXPRESSION CONSISTS OF A LITERAL which behaves just like string matching. For e.g. ◦ cat will match cat in About cats and dogs. Special characters known as meta characters needs to be escaped with a in regular expressions if they are used as part of a literal: ◦ dogs.will match dogs. in About cats and dogs. Meta characters are: ◦ [ ^ $ . | ? * + ( ) {
  5. 5. WITH A "CHARACTER CLASS", ALSO CALLED "CHARACTER SET", YOU CAN TELL THE REGEX ENGINE TO MATCH ONLY ONE OUT OF SEVERAL CHARACTERS. FOR E.G. ◦ gr[ae]ywill match grey and gray both. Ranges can be specified using dash. For e.g. ◦ [0-9]will match any digit from 0 to 9. ◦ [0-9a-fA-F]will match any single hexadecimal digit. Caret after the opening square bracket will negate the character class. • The result is that the character class will match any character that is not • in the character class. For e.g. ◦ [^0-9] will match any thing except number. ◦ q[^u] will not match Iraq but it will match Iraq is a country
  6. 6. Meta characters works fine without escaping in Character classes. For e.g. ◦ [+*]is a valid expression and match either * or +. There are some pre-defined character classes known as short hand character classes: ◦ w stands for[A-Za-z0-9_] ◦ s stands for[ trn] ◦ d stands for[0-9] If a character class is repeated by using the ?, * or + operators, the entire character class will be repeated, and not just the character that it matched. For e.g. ◦ [0-9]+ can match 837 as well as 222 ◦ ([0-9])1+ will match 222 but not 837.
  7. 7. The famous dot “.” operator matches anything. For e.g. ◦ a.b will match abb, aab, a+b etc. ^ and $ are used to match start and end of regular expressions. For e.g. ◦ ^My.*.$ will match anything starting with My and ending with a dot. Pipe operator is used to match a string against either its left or the right part. For e.g. ◦ (cat|dog) can match both cat or dog. Question: ◦ If the expression is Get|GetValue|Set|SetValue and string is SetValue. What will this match and why? ◦ What if the expression becomes Get(Value)?|Set(Value)? * or {0,} and+ or {1,} are used to control repititions.
  8. 8. Round brackets besides grouping part of a regular expression together, also create a "backreference". A backreference stores the matching part of the string matched by the part of the regular expression inside the parentheses. For e.g. ◦ ([0-9])1+ will match 222 but not 837. If backreference are not required, you can optimize this regular expression Set(?:Value)? Backreferences can be used in expressions itself or in replacement text. For e.g. ◦ <([A-Za-z][A-Za-z0-9]*)>.*</1>will match matching opening and closing tags.
  9. 9. /i makes the regex match case insensitive. ◦ [A-Z] will match A and a with this modifier. /s enables "single-line mode". In this mode, the dot matches newlines as well. ◦ .* will match sherazrnattari with this modifier. /m enables "multi-line mode". In this mode, the caret and dollar match before and after newlines in the subject string. ◦ .* will match only sherazin sherazrnattari with this modifier. /x enables "free-spacing mode". In this mode, whitespace between regex tokens is ignored, and an unescaped # starts a comment. ◦ #sherazrnrn.* will match only sheraz in with this modifier.
  10. 10. A conditional is a special construct that will first evaluate a lookaround, and then execute one sub-regex if the lookaround succeeds, and another sub-regex if the lookaround fails. Example of Positive lookahead is: ◦ q(?=uv*)will match q in quvvvv and qu. Example of Negative lookahead is: ◦ q(?!uv*)will match q not followed by u and uv. Example of Positive lookbehind is: ◦ (?<=b)awill match a prefixed by b like ba. Example of Negative lookbehind is: ◦ (?<!b)awill match a not prefixed by b like ca and da etc.
  11. 11. abc… Letters 123… Digits d Any Digit D Any Non-digit character . Any Character . Period [abc] Only a, b, or c [^abc]Not a, b, nor c [a-z] Characters a to z [0-9] Numbers 0 to 9 w Any Alphanumeric character W Any Non-alphanumeric character {m} m Repetitions {m,n} m to n Repetitions * Zero or more repetitions + One or more repetitions ? Optional character s Any Whitespace S Any Non-whitespace character ^…$ Starts and ends (…) Capture Group (a(bc)) Capture Sub-group (.*) Capture all (abc|def) Matches abc or def
  Most of the content is taken from