Regular Expressions
for
Beginners
Srikanth Modegunta
Introduction

Also referred to as Regex or RegExp

Used to match the pattern of text
− Ex: maven and maeven can be match...
Introduction (Contd..)

Used where text processing is required.

XML parsing involves Regex as it is based on the patter...
Meta Characters
The following are the meta characters
 | ( ) [ { ^ $ * + ? .
Meta Characters (Contd..)
Character Meaning
* 0 or more
+ 1 or more
? 0 or 1 (optional)
. All characters excluding new-lin...
Meta Characters (Contd..)
Character Meaning
{ } If I know How many times the pattern
repeats I can use this
Ex: a{2, 5} ma...
Quantifiers

To specify the quantity
− Ex: ear, eaaaar – the quantity of a is 1 and 4
in these two cases.

If a pattern ...
Quantifiers (Contd..)
* 0 or more times (it is hungry matching)
Ex: ca* matches c, ca, caa, caaa etc.
Matches even if the ...
Quantifiers (Contd..)
*? Lazy matching i.e it matches 0 or
more times but stops at first match
Ex: if text is “caaaaaa” th...
Character Sets

Matches one character among the set of
characters

[abcd] is same as [a-d]

[a-di-l] is same as [abcdij...
Characters for Matching
Common character classes shorthand
[a-zA-Z0-9_] w
[0-9] d
[ tnr] s
[^a-zA-Z0-9_] W
[^0-9] D
[^ tnr...
Simple Matching

modegunta.srikanth@gmail.com
− mail id should not start with number or special
symbols
− Mail id id can ...
Modifiers
Modifier Meaning
i Case insensitive
g Global matching (in perl)
m Multiline matching
s Dot all ('.' matches n al...
Grouping

Groups can be captured using parenthesis
− (<pattern>)
− Saves the text identified by the group into a
backrefe...
Grouping Example

If the command is
− /sbin/service <service-name> <command>
− ([^s]+)s+([w-_]+)s+(start|stop|status)
− G...
Back References

Stores the part of the string matched by the part
of the regular expression inside the
parentheses

If ...
Back references example

For example take the xml tag
− <root id=”E12”>test</root>
− <([w-_]+)s*([^<>]+)?>w+</1> matches
...
No grouping with parenthesis

If groups are not required for the parenthesized
patterns
− Use ?: inside group (?:)
− (tex...
Look ahead and Look behind

Positive look-ahead
− w+(?=:) not all words.... select words that come
before ':'

Negative ...
References:
1) http://www.regular-expressions.info/tutorial.html
2) Thinking in java 4th
Editon –
Chapter: Strings
page 392
Thank You
Upcoming SlideShare
Loading in...5
×

Regex startup

260

Published on

This presentation supports as a startup guide for the Regex learners.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
260
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Regex startup

  1. 1. Regular Expressions for Beginners Srikanth Modegunta
  2. 2. Introduction  Also referred to as Regex or RegExp  Used to match the pattern of text − Ex: maven and maeven can be matched with regex “mae?ven”  Regular Expressions are processed by a piece of software called “Regular Expressions Engine”  Most of the languages support Regex − Ex: perl, java, c# etc.
  3. 3. Introduction (Contd..)  Used where text processing is required.  XML parsing involves Regex as it is based on the pattern matching. − We will see how to match xml or html tag.  Automation of the tasks − Ex: if mail subject contains “<operation> <some task name> <command>” then start processing the task.  Text Editors updating the comments to functions automatically(Replacing a pattern with some text) − Ex: replace − “sub subroutine(parameters){<statements>}” by /* this is a sample subroutine*/ sub subroutine(parameters){<statements>}
  4. 4. Meta Characters The following are the meta characters | ( ) [ { ^ $ * + ? .
  5. 5. Meta Characters (Contd..) Character Meaning * 0 or more + 1 or more ? 0 or 1 (optional) . All characters excluding new-line ^ Start of line. But [^abc] means character other than 'a' or 'b' or 'c' $ End of line A Start of string Z End of string
  6. 6. Meta Characters (Contd..) Character Meaning { } If I know How many times the pattern repeats I can use this Ex: a{2, 5} matches 'a' repeated minimum 2 times and maximum 5 times. | Saying 'or' in patterns Ex: cat|dog|mouse () Used to capture groups [ ] Only one letter from the set
  7. 7. Quantifiers  To specify the quantity − Ex: ear, eaaaar – the quantity of a is 1 and 4 in these two cases.  If a pattern is repeated then we need to use quantifiers to match that repeated pattern.  To match the above case we use the following regex − ea+r means a can come 1 or more times
  8. 8. Quantifiers (Contd..) * 0 or more times (it is hungry matching) Ex: ca* matches c, ca, caa, caaa etc. Matches even if the character does not exist and matches any number of 'a' s generally till last occurrence of pattern + 1 or more times (it is hungry matching) Ex: ca+ matches ca, caa, caaa etc {n} Match exactly n times Ex: ca{4}r matches caaaar {m,} Matches minimum of m times and maximum of more than m times Ex: ca{2,}r matches only if a repeats greater than 2 times. (hungry matching) {m,n} Matches minimum m times and maximum n times. Ex: ca{2,3}r matches and 'a' repeats minimum 2 times and maximum 3 times. (hungry matching) Hungry Matching refers to the behavior that the pattern matches maximum possible text. Ex: for ca{0,4} the text “caaaa” matches I.e all the 4 'a's are matched.
  9. 9. Quantifiers (Contd..) *? Lazy matching i.e it matches 0 or more times but stops at first match Ex: if text is “caaaaaa” then “ca*?” will match only 'c'. +? Lazy matching i.e it matches 1 or more times but stops at first match Ex: if text is “caaaaaa” then “ca+?” will match only 'ca'. ?? Lazy matching i.e it matches 0 or 1 times but stops at first match Ex: if text is “ca” then “ca??” will match only 'c'. {min,}? {n}? {min, max}? Lazy matching Lazy Matching refers to the behavior that the pattern matches minimum possible text. Ex: for ca{0,4}? the text “caaaa” matches only “c”
  10. 10. Character Sets  Matches one character among the set of characters  [abcd] is same as [a-d]  [a-di-l] is same as [abcdijkl]  [^abcd] matches any character other than a,b,c,d  Quantifiers can be applied to the character sets − [a-z]+ matches the string 'hello' in 'hello1234E'
  11. 11. Characters for Matching Common character classes shorthand [a-zA-Z0-9_] w [0-9] d [ tnr] s [^a-zA-Z0-9_] W [^0-9] D [^ tnr] S b Word Boundary B Other than a Word Boundary
  12. 12. Simple Matching  modegunta.srikanth@gmail.com − mail id should not start with number or special symbols − Mail id id can start with _ − Mail id can have '.' in the middle − Should end with @domain.com  Pattern : − [a-zA-Z_][a-zA-Z_.]+@w+.(com|co.in) − Meta characters must be escaped in the pattern to match them as normal characters
  13. 13. Modifiers Modifier Meaning i Case insensitive g Global matching (in perl) m Multiline matching s Dot all ('.' matches n also) x Extended regex pattern (pretty format ref: perl) e (Used for replacing string) evaluate the replacing pattern as an expression (ref: perl)
  14. 14. Grouping  Groups can be captured using parenthesis − (<pattern>) − Saves the text identified by the group into a backreference (we will see it later)  Groups are to capture part of text in the matching pattern − Ex: take simple xml element <root>test</root> − <(w+)>.*?</1> − Here 1 is back reference  Java has a method “group(int)” method in “java.util.regex.Matcher” class.
  15. 15. Grouping Example  If the command is − /sbin/service <service-name> <command> − ([^s]+)s+([w-_]+)s+(start|stop|status) − Group 0=matched pattern − Group 1=”/sbin/service” − Group 2=<service-name> − Group 3=<command> − Command can be start, stop or status
  16. 16. Back References  Stores the part of the string matched by the part of the regular expression inside the parentheses  If there is any string that occurs multiple times in the input, we can use back reference to identify the match  Ex: xml/html start-tag should have the end-tag  Here if we capture the start-tag name in first group, we can put end-tag name as back reference (1)
  17. 17. Back references example  For example take the xml tag − <root id=”E12”>test</root> − <([w-_]+)s*([^<>]+)?>w+</1> matches xml element − Group 0: <root id=”E12”>test</root> − Group 1: root − Group 2: id=”E12” − 1 in the regex pattern is the back reference to group 1.
  18. 18. No grouping with parenthesis  If groups are not required for the parenthesized patterns − Use ?: inside group (?:) − (text1|text2|text3) is any on of text1, text2 and text3 − (?:text1|text2|text3) but will not be a group
  19. 19. Look ahead and Look behind  Positive look-ahead − w+(?=:) not all words.... select words that come before ':'  Negative look-ahead − w+(?!:) words other than those coming before :  When the pattern comes the regex engine looks ahead for the filtering pattern in case of Look ahead.  Positive look-behind − (?<=a)b selects 'b' that follows 'a'  Negative look-behind − (?<!a)b selects 'b' that doesn't follow 'a'  When the pattern comes the regex engine looks behind for the filtering pattern in case of Look behind.
  20. 20. References: 1) http://www.regular-expressions.info/tutorial.html 2) Thinking in java 4th Editon – Chapter: Strings page 392
  21. 21. Thank You
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×