Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Regular Expressions In Action


Published on

It explains building blocks of regular expressions and their usage with easy to understand examples.

Published in: Technology
  • Copas Url to Download PDF eBook ===
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Regular Expressions In Action

  1. 1. Regular Expression in Action<br />Brief overview of Regular Expression building blocks and tools with a practical example<br />Muhammad Sheraz Siddiqi<br /><br />
  2. 2. What are Regular Expressions<br />Tools to learn<br />Literal characters and Special characters<br />Build blocks of Regular Expressions   <br />Grouping and Backreferences<br />Unicode characters in regular expressions<br />Regex Matching Modes<br />Lookarounds<br />Parse a log file…<br />This Presentation…<br /><br />
  3. 3. Regular expressions provide a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters.<br />What are Regular Expressions?<br /><br />
  4. 4. The Regex Coach is a graphical application for Windows which can be used to experiment with regular expressions interactively.<br /><br />Notepad++ is a text editor that has support of find and replace using Regular Expressions.<br /><br />Web based Regular Expressions tester.<br /><br />Tools to learn?<br /><br />
  5. 5. The most basic regular expression consists of a literal which behaves just like string matching. For e.g.<br />catwill match cat in About cats and dogs.<br />Special characters known as meta characters needs to be escaped with a in regular expressions if they are used as part of a literal: <br />dogs.will match dogs. in About cats and dogs.<br />Meta characters are:<br />[ ^ $ . | ? * + ( ) {<br />Literal and Special characters<br /><br />
  6. 6. With a "character class", also called "character set", you can tell the regex engine to match only one out of several characters. For e.g.<br />gr[ae]ywill match grey and gray both.<br />Ranges can be specified using dash. For e.g. <br />[0-9]will match any digit from 0 to 9.<br />[0-9a-fA-F]will match any single hexadecimal digit.<br />Caret after the opening square bracket will negate the character class. The result is that the character class will match any character that is not in the character class. For e.g.<br />[^0-9]will match any thing except number.<br />q[^u]will not match Iraq but it will match Iraq is a country<br />Character Classes and Shorthands<br /><br />
  7. 7. Meta characters works fine without escaping in Character classes. For e.g.<br />[+*]is a valid expression and match either * or +.<br />There are some pre-defined character classes known as short hand character classes:<br />wstands for[A-Za-z0-9_]<br />sstands for[ trn]<br />dstands for[0-9]<br />If a character class is repeated by using the ?, * or + operators, the entire character class will be repeated, and not just the character that it matched. For e.g.<br />[0-9]+ can match 837 as well as 222<br />([0-9])1+ will match 222 but not 837.<br />Character Classes and Shorthands<br /><br />
  8. 8. The famous dot “.” operator matches anything. For e.g.<br />a.bwill match abb, aab, a+betc.<br />^ and $ are used to match start and end of regular expressions. For e.g.<br />^My.*.$will match anything starting with My and ending with a dot.<br />Pipe operator is used to match a string against either its left or the right part. For e.g.<br />(cat|dog) can match both cat or dog. <br />Question:<br />If the expression is Get|GetValue|Set|SetValueand string isSetValue. What will this match and why?<br />What if the expression becomes Get(Value)?|Set(Value)?<br />* or {0,} and+ or {1,} are used to control repititions.<br />Building blocks of Regular Exp.<br /><br />
  9. 9. Round brackets besides grouping part of a regular expression together, also create a "backreference". A backreference stores the matching part of the string matched by the part of the regular expression inside the parentheses. For e.g.<br />([0-9])1+ will match 222 but not 837.<br />If backreference are not required, you can optimize this regular expression Set(?:Value)?<br />Backreferences can be used in expressions itself or in replacement text. For e.g.<br /><([A-Za-z][A-Za-z0-9]*)>.*</1> will match matching opening and closing tags.<br />Grouping and Backreferences<br /><br />
  10. 10. Unicode characters can be used as uxxxx in regular expressions. For e.g.<br />عطاری cat be matched in an expression as: u0639u0637u0627u0631u06cc<br />Unicode characters in Regular Exp.<br /><br />
  11. 11. /i makes the regex match case insensitive. <br />[A-Z] will match A and a with this modifier.<br />/s enables "single-line mode". In this mode, the dot matches newlines as well. <br />.* will match sherazrnattari with this modifier.<br />/m enables "multi-line mode". In this mode, the caret and dollar match before and after newlines in the subject string.<br />.* will match only sherazin sherazrnattari with this modifier.<br />/x enables "free-spacing mode". In this mode, whitespace between regex tokens is ignored, and an unescaped # starts a comment. <br />#sherazrnrn.* will match only sherazin with this modifier.<br />Regular Exp. Matching Modes<br /><br />
  12. 12. A conditional is a special construct that will first evaluate a lookaround, and then execute one sub-regex if the lookaround succeeds, and another sub-regex if the lookaround fails.<br />Example of Positive lookahead is:<br />q(?=uv*)will match q in quvvvv and qu.<br />Example of Negative lookahead is:<br />q(?!uv*)will match q not followed by u and uv.<br />Example of Positive lookbehind is:<br />(?<=b)awill match a prefixed by b like ba.<br />Example of Negative lookbehind is:<br />(?<!b)awill match a not prefixed by b like ca and da etc.<br />Lookarounds with Conditions…<br /><br />
  13. 13. Example1:: I have an access log (access.log) file of Helix DNA server. I want to calculate how many times each content is access and update download and listen count of each content in the database. <br />Exp: ^(.*)asxgen/Data/Naat/Download(.*)/(d+).(mp3|rm)(.*)$<br />Replace: UPDATE DB.TBL set col=col + COUNT where id=3;<br />Example2:: I have application generated log (applog.txt) file of a web application. I want to fetch required information from relevant rows. In order to remove irrelevant rows:<br />Exp: ^(?!((.*)ID:s(.*)sStatus:s(.*))).*$<br />Replace: Empty string<br />Parse a log file…<br /><br />
  14. 14. Questions Please…..<br /><br />Thank you for being here…<br />Most of the content is taken from:<br /><br /><br />