» String Manipulation
» Matching / Validating
» Extracting / Capturing
» Modifying / Substitution

https://www.facebook.co...
Agenda
» What is Regular Expression
» Regular Expression Syntax
˃ Character Classes
˃ Quantifiers
˃ Meta Characters.
» Bas...
What are Regular Expressions?
» Regular Expressions are a language of string patterns built
into most modern programming l...
Regular Expression Syntax
» Regular Expressions, by definition, are string patterns
that describe text.
» These descriptio...
Character Classes
Character
Class

Explanation and Alternatives

.

Match any character (may or may not match line termina...
Quantifiers
Quantifiers Explanation and Alternatives
*

Match zero or more times, is an alternative for

+

Match one or m...
Meta Characters
Meta
Characters

Explanation



Escape the next meta-character (it becomes a normal
/ literal character)
M...
Basic Expression: Example I
» Every string is a Regular Expression.
» For example, the string, "I study English", is a reg...
Basic Expression: Example II
"I study w+"
» As you can see, the above pattern "I study w+" uses
both a character class and...
Example II Demo
public class RegexBasicExampleII {
public static void main(String[] args) {
System.out.println("I study En...
Example II Demo (Alternative)
public class RegexBasicExampleII {
public static void main(String[] args) {
System.out.print...
Basic Expression: Example III
» But the pattern "I study w+" will not match "I study:
English", because as soon as the exp...
Example III Demo
public class RegexBasicExampleIII {
public static void main(String[] args) {
System.out.println("I study ...
Basic Expression: Example IV
» Also the pattern "I study w+" will not match neither the
string "i study English" and nor "...
Example IV Demo
public class RegexBasicExampleIV {
public static void main(String[] args) {
System.out.println("I study En...
Regular Expression Basic Grouping
» An important feature of Regular Expressions is the
ability to group sections of a patt...
Regular Expression Basic Grouping
» "I study (Java|English|Programming|Math|Islamic|HTML)"
» The new expression will now m...
Basic Grouping Demo I (Case Sensitive)
public class BasicGroupingDemoI {

public static void main(String[] args) {
String ...
Basic Grouping Demo I (Case Insensitive)
public class BasicGroupingDemoI {

public static void main(String[] args) {
Strin...
Matching / Validating
» Regular Expressions make it possible to find all instances of
text that match a certain pattern, a...
SSN Match and Validation
public class SSNMatchAndValidate {
public static void main(String[] args) {
String pattern = "^(d...
SSN Match and Validation Detail
"^(d{3}-?d{2}-?d{4})$"
Regular

// 123-45-6789

Meaning

Expression
^

match the beginning...
SSN Match and Validation (Alternative)
public class SSNMatchAndValidateII {
public static void main(String[] args) {
Strin...
SSN Match and Validation Detail
"^([0-9]{3}-?[0-9]{2}-?[0-9]{4})$"
Regular

// 123-45-6789

Meaning

Expression
^

match t...
Extracting / Capturing
» Capturing groups are an extremely useful feature of
Regular Expression matching that allow us to ...
Extracting / Capturing Numbers
import java.util.regex.Matcher;

import java.util.regex.Pattern;

public class ExtractingNu...
Extracting / Capturing Explanation
» Import the needed classes
import java.util.regex.Matcher;
import java.util.regex.Patt...
Extracting / Capturing
Explanation
» m.find()
˃ returns true if the pattern matches any part of the
text string,
˃ If call...
Extract / Capture Emails
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class ExtractEmails {
publ...
Modifying / Substitution
» Values in String can be replaced with new values
» For example, you could replace all instances...
Mask Sensitive Information
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Substitutions {
pu...
Mask Sensitive Information
(OUTPUT)
» Masking: 20120
» Masking: 20121
» Masking: 20122
» Three student with StudentID=***m...
Conclusion
» Regular Expressions are not easy to use at first
˃ It is a bunch of punctuation, not words
˃ It takes practic...
END

34

https://www.facebook.com/Oxus20
Upcoming SlideShare
Loading in...5
×

Java Regular Expression PART I

1,046

Published on

Regular Expressions (Regex) is powerful and convenient to use for string manipulation i.e. matching and validation, extracting and capturing, modifying and substitution, etc. This presentation covers Regular Expression with real world examples and demos.
All in all, Regular Expression is worth learning!!!

Published in: Education, Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,046
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
58
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Java Regular Expression PART I

  1. 1. » String Manipulation » Matching / Validating » Extracting / Capturing » Modifying / Substitution https://www.facebook.com/Oxus20 oxus20@gmail.com Java Regular Expression PART I Abdul Rahman Sherzad
  2. 2. Agenda » What is Regular Expression » Regular Expression Syntax ˃ Character Classes ˃ Quantifiers ˃ Meta Characters. » Basic Expression Example » Basic Grouping Example » Matching / Validating » Extracting/Capturing » Modifying/Substitution https://www.facebook.com/Oxus20 2
  3. 3. What are Regular Expressions? » Regular Expressions are a language of string patterns built into most modern programming languages, Perl, PHP, .NET and including Java 1.4 onward. » A regular expression defines a search pattern for strings. » Regular expressions can be used to search, edit and manipulate text. » The abbreviation for Regular Expression is Regex. 3 https://www.facebook.com/Oxus20
  4. 4. Regular Expression Syntax » Regular Expressions, by definition, are string patterns that describe text. » These descriptions can then be used in nearly infinite ways. » The basic language constructs include ˃ Character Classes ˃ Quantifiers ˃ Meta Characters. 4 https://www.facebook.com/Oxus20
  5. 5. Character Classes Character Class Explanation and Alternatives . Match any character (may or may not match line terminators) d Matches a digit, is an alternative for: D Matches a non-digit character, is an alternative for: s Matches a whitespace character, is an alternative for: [0-9] [^0-9] [ tnx0Bfr] S Matches a non-whitespace character, is an alternative for: w Match a word character, is an alternative for: W Match a non-word character, is an alternative for: [^s] [a-zA-Z_0-9] [^w] NOTE: in Java, you will need to "double escape" these backslashes "" i.e. "d" should be "d". 5 https://www.facebook.com/Oxus20
  6. 6. Quantifiers Quantifiers Explanation and Alternatives * Match zero or more times, is an alternative for + Match one or more times, is an alternative for {1,} ? Match no or one times, ? is an alternative for {0,1} {n} Match exactly {n,} Match at least n times, {n,m} Match at least n but not more than m times {0,} n number of times Quantifiers can be used to specify the number or length that part of a pattern should match or repeat. https://www.facebook.com/Oxus20 6
  7. 7. Meta Characters Meta Characters Explanation Escape the next meta-character (it becomes a normal / literal character) Match the beginning of the line ^ . Match any character (except newline) $ Match the end of the line (or before newline at the end) Alternation for ('or' statement) | () Grouping [] Custom character class Meta-characters are used to group, divide, and perform special operations in patterns. https://www.facebook.com/Oxus20 7
  8. 8. Basic Expression: Example I » Every string is a Regular Expression. » For example, the string, "I study English", is a regular expression that will match exactly the string, "I study English", and will ignore everything else. » What if we want to be able to find more subject that we study? We can replace the word English with a character class expression that will match any subject. Example on next slide … 8 https://www.facebook.com/Oxus20
  9. 9. Basic Expression: Example II "I study w+" » As you can see, the above pattern "I study w+" uses both a character class and a quantifier. » The character class "w" says match a word character » The quantifier "+" says match one or more. » Now the pattern "I study w+" will match any word in place of "English" i.e. "I study Programming", "I study Math", "I study Database", etc. https://www.facebook.com/Oxus20 9
  10. 10. Example II Demo public class RegexBasicExampleII { public static void main(String[] args) { System.out.println("I study English".matches("I study w+")); // true System.out.println("I study Programming".matches("I study w+")); // true System.out.println("I study JAVA".matches("I study w+")); // true System.out.println("I study: JAVA".matches("I study w+")); // false } } 10 https://www.facebook.com/Oxus20
  11. 11. Example II Demo (Alternative) public class RegexBasicExampleII { public static void main(String[] args) { System.out.println("I study English".matches("I study [a-zA-Z_0-9]+")); // true System.out.println("I study Programming".matches("I study [a-zA-Z_0-9]+")); // true System.out.println("I study JAVA".matches("I study [a-zA-Z_0-9]+")); // true System.out.println("I study: JAVA".matches("I study [a-zA-Z_0-9]+")); // false } } 11 https://www.facebook.com/Oxus20
  12. 12. Basic Expression: Example III » But the pattern "I study w+" will not match "I study: English", because as soon as the expression finds the ":" character, which is not a word character, it will stop matching. » If we want the expression to be able to handle this situation, then we need to make a small change as follow: » "I study:? w+" » Now the pattern "I study:? w+" will match "I study Programming" and also "I study: Programming" 12 https://www.facebook.com/Oxus20
  13. 13. Example III Demo public class RegexBasicExampleIII { public static void main(String[] args) { System.out.println("I study English".matches("I study:? w+")); // true System.out.println("I study Programming".matches("I study:? w+")); // true System.out.println("I study JAVA".matches("I study:? w+")); // true System.out.println("I study: JAVA".matches("I study:? w+")); // true } } 13 https://www.facebook.com/Oxus20
  14. 14. Basic Expression: Example IV » Also the pattern "I study w+" will not match neither the string "i study English" and nor "I Study English" , because as soon as the expression finds the lowercase "i", which is not equal uppercase "I", it will stop matching. » If we want the expression to be able to handle this situation does not care about the case sensitivity then we need to make a small change as follow: » "(?i)I study w+" » Now the pattern "(?i)I study w+" will match both "I STUDY JAVA" and also "i StUdY JAVA" https://www.facebook.com/Oxus20 14
  15. 15. Example IV Demo public class RegexBasicExampleIV { public static void main(String[] args) { System.out.println("I study English".matches("(?i)I study w+")); // true System.out.println("i STUDY English".matches("(?i)I study w+")); // true System.out.println("I study JAVA".matches("(?i)I study w+")); // true System.out.println("i StUdY JAVA".matches("(?i)I study w+")); // true } } 15 https://www.facebook.com/Oxus20
  16. 16. Regular Expression Basic Grouping » An important feature of Regular Expressions is the ability to group sections of a pattern, and provide alternate matches. » The following two meta-characters are core parts of flexible Regular Expressions ˃ | Alternation ('or' statement) ˃ () Grouping » Consider if we know exactly subjects we are studying, and we want to find only those subjects but nothing else. Following is the pattern: » "I study (Java|English|Programming|Math|Islamic|HTML)" 16 https://www.facebook.com/Oxus20
  17. 17. Regular Expression Basic Grouping » "I study (Java|English|Programming|Math|Islamic|HTML)" » The new expression will now match the beginning of the string "I study", and then any one of the subjects in the group, separated by alternators, "|"; any one of the following would be a match: ˃ Java ˃ English ˃ Programming ˃ Math ˃ Islamic ˃ HTML 17 https://www.facebook.com/Oxus20
  18. 18. Basic Grouping Demo I (Case Sensitive) public class BasicGroupingDemoI { public static void main(String[] args) { String pattern = "I study (Java|English|Programming|Math|Islamic|HTML)"; System.out.println("I study English".matches(pattern)); // true System.out.println("I study Programming".matches(pattern)); // true System.out.println("I study Islamic".matches(pattern)); // true // english with lowercase letter "e" is not in our group System.out.println("I study english".matches(pattern)); // false // CSS is not in our group System.out.println("I study CSS".matches(pattern)); // false } } 18 https://www.facebook.com/Oxus20
  19. 19. Basic Grouping Demo I (Case Insensitive) public class BasicGroupingDemoI { public static void main(String[] args) { String pattern = "(?i)I study (Java|English|Programming|Math|Islamic|HTML)"; System.out.println("I study English".matches(pattern)); // true System.out.println("I study Programming".matches(pattern)); // true System.out.println("I study Islamic".matches(pattern)); // true System.out.println("I study english".matches(pattern)); // true // CSS is not in our group System.out.println("I study CSS".matches(pattern)); // false } } 19 https://www.facebook.com/Oxus20
  20. 20. Matching / Validating » Regular Expressions make it possible to find all instances of text that match a certain pattern, and return a Boolean value if the pattern is found / not found. » This can be used to validate user input such as ˃ ˃ ˃ ˃ ˃ Phone Numbers Social Security Numbers (SSN) Email Addresses Web Form Input Data and much more. » Consider the purpose is to validate the SSN if the pattern is found in a String, and the pattern matches a SSN, then the string is an SSN. 20 https://www.facebook.com/Oxus20
  21. 21. SSN Match and Validation public class SSNMatchAndValidate { public static void main(String[] args) { String pattern = "^(d{3}-?d{2}-?d{4})$"; String input[] = new String[5]; input[0] input[1] input[2] input[3] input[4] = = = = = "123-45-6789"; "9876-5-4321"; "987-650-4321"; "987-65-4321 "; "321-54-9876"; for (int i = 0; i < input.length; i++) if (input[i].matches(pattern)) { System.out.println("Found correct } } OUTPUT: } Found correct } Found correct https://www.facebook.com/Oxus20 { SSN: " + input[i]); SSN: 123-45-6789 SSN: 321-54-9876 21
  22. 22. SSN Match and Validation Detail "^(d{3}-?d{2}-?d{4})$" Regular // 123-45-6789 Meaning Expression ^ match the beginning of the line () group everything within the parenthesis as group 1 d{3} match only 3 digits -? optionally match a dash d{2} match only 2 digits -? optionally match a dash d{4} match only 4 digits $ match the end of the line https://www.facebook.com/Oxus20 22
  23. 23. SSN Match and Validation (Alternative) public class SSNMatchAndValidateII { public static void main(String[] args) { String pattern = "^([0-9]{3}-?[0-9]{2}-?[0-9]{4})$"; String input[] = new String[5]; input[0] input[1] input[2] input[3] input[4] = = = = = "123-45-6789"; "9876-5-4321"; "987-650-4321"; "987-65-4321 "; "321-54-9876"; for (int i = 0; i < input.length; i++) if (input[i].matches(pattern)) { System.out.println("Found correct } } OUTPUT: } Found correct } Found correct https://www.facebook.com/Oxus20 { SSN: " + input[i]); SSN: 123-45-6789 SSN: 321-54-9876 23
  24. 24. SSN Match and Validation Detail "^([0-9]{3}-?[0-9]{2}-?[0-9]{4})$" Regular // 123-45-6789 Meaning Expression ^ match the beginning of the line () group everything within the parenthesis as group 1 [0-9]{3} match only 3 digits -? optionally match a dash [0-9]{2} match only 2 digits -? optionally match a dash [0-9]{4} match only 4 digits $ match the end of the line https://www.facebook.com/Oxus20 24
  25. 25. Extracting / Capturing » Capturing groups are an extremely useful feature of Regular Expression matching that allow us to query the Matcher to find out what the part of the string was that matched against a particular part of the regular expression. » Consider you have a large complex body of text (with an unspecified number of numbers) and you would like to extract all the numbers. » Next Slide demonstrate the example 25 https://www.facebook.com/Oxus20
  26. 26. Extracting / Capturing Numbers import java.util.regex.Matcher; import java.util.regex.Pattern; public class ExtractingNumbers { public static void main(String[] args) { String text = "Abdul Rahman Sherzad with university ID of 20120 is trying to demonstrate the power of Regular Expression for OXUS20 members."; Pattern p = Pattern.compile("d+"); Matcher m = p.matcher(text); while (m.find()) { System.out.println(m.group()); } OUTPUT: 20120 20 } } https://www.facebook.com/Oxus20 26
  27. 27. Extracting / Capturing Explanation » Import the needed classes import java.util.regex.Matcher; import java.util.regex.Pattern; » First, you must compile the pattern Pattern p = Pattern.compile("d+"); » Next, create a matcher for a target text by sending a message to your pattern Matcher m = p.matcher(text); » NOTES ˃ Neither Pattern nor Matcher has a public constructor; + use static Pattern.compile(String regExpr) for creating pattern instances + using Pattern.matcher(String text) for creating instances of matchers. ˃ The matcher contains information about both the pattern and the target text. https://www.facebook.com/Oxus20 27
  28. 28. Extracting / Capturing Explanation » m.find() ˃ returns true if the pattern matches any part of the text string, ˃ If called again, m.find() will start searching from where the last match was found ˃ m.find() will return true for as many matches as there are in the string; after that, it will return false 28 https://www.facebook.com/Oxus20
  29. 29. Extract / Capture Emails import java.util.regex.Matcher; import java.util.regex.Pattern; public class ExtractEmails { public static void main(String[] args) { String text = "Abdul Rahman Sherzad absherzad@gmail.com on OXUS20 oxus20@gmail.com"; String pattern = "[A-Za-z0-9-_]+(.[A-Za-z0-9-_]+)*@[AZa-z0-9-]+(.[A-Za-z0-9]+)*(.[A-Za-z]{2,})"; Pattern p = Pattern.compile(pattern); Matcher m = p.matcher(text); while (m.find()) { System.out.println(m.group()); } } OUTPUT: absherzad@gmail.com oxus20@gmail.com } https://www.facebook.com/Oxus20 29
  30. 30. Modifying / Substitution » Values in String can be replaced with new values » For example, you could replace all instances of the word 'StudentID=', followed by an ID, with a mask to hide the original ID. » This can be a useful method of filtering sensitive information. » Next Slide demonstrate the example 30 https://www.facebook.com/Oxus20
  31. 31. Mask Sensitive Information import java.util.regex.Matcher; import java.util.regex.Pattern; public class Substitutions { public static void main(String[] args) { String text = "Three student with StudentID=20120, StudentID=20121 and finally StudentID=20122."; Pattern p = Pattern.compile("(StudentID=)([0-9]+)"); Matcher m = p.matcher(text); StringBuffer result = new StringBuffer(); while (m.find()) { System.out.println("Masking: " + m.group(2)); m.appendReplacement(result, m.group(1) + "***masked***"); } m.appendTail(result); System.out.println(result); } 31 } https://www.facebook.com/Oxus20
  32. 32. Mask Sensitive Information (OUTPUT) » Masking: 20120 » Masking: 20121 » Masking: 20122 » Three student with StudentID=***masked***, StudentID=***masked*** and finally StudentID=***masked***. 32 https://www.facebook.com/Oxus20
  33. 33. Conclusion » Regular Expressions are not easy to use at first ˃ It is a bunch of punctuation, not words ˃ It takes practice to learn to put them together correctly. » Regular Expressions form a sub-language ˃ It has a different syntax than Java. ˃ It requires new thought patterns ˃ Can't use Regular Expressions directly in java; you have to create Patterns and Matchers first or use the matches method of String class. » Regular Expressions is powerful and convenient to use for string manipulation ˃ It is worth learning!!! 33 https://www.facebook.com/Oxus20
  34. 34. END 34 https://www.facebook.com/Oxus20
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×