Regular expressions

598 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
598
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
34
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Regular expressions

  1. 1. /Regular Expressions/ In Java
  2. 2. Credits• The Java Tutorials: Regular Expressions• docs.oracle.com/javase/tutorial /essential/regex/
  3. 3. Regex• Regular expressions are a way to describe a set of strings based on common characteristics shared by each string in the set.• They can be used to search, edit, or manipulate text and data.• They are created with a specific syntax.
  4. 4. Regex in Java• Regex in Java is similar to Perl• The java.util.regex package primarily consists of three classes: Pattern, Matcher, and PatternSyntaxException.
  5. 5. Pattern & PatternSyntaxException• You can think of this as the regular expression wrapper object.• You get a Pattern by calling: – Pattern.compile(“RegularExpressionString”);• If your “RegularExpressionString” is invalid, you will get the PatternSyntaxException.
  6. 6. Matcher• You can think of this as the search result object.• You can get a matcher object by calling: – myPattern.matcher(“StringToBeSearched”);• You use it by calling: – myMatcher.find()• Then call any number of methods on myMatcher to see attributes of the result.
  7. 7. Regex Test Harness• The tutorials give a test harness that uses the Console class. It doesn’t work in any IDE.• So I rewrote it to use Basic I/O
  8. 8. It’s time for…CODE DEMO
  9. 9. Regex• Test harness output example.• Input is given in Bold.Enter your regex: fooEnter input string to search: foofooFound ‘foo’ at index 0, ending at index 3.Found ‘foo’ at index 3, ending at index 6.
  10. 10. Indexing
  11. 11. Metacharacters• <([{^-=$!|]})?*+.>• Precede a metacharacter with a ‘’ to treat it as a ordinary character.• Or use Q and E to begin and end a literal quote.
  12. 12. MetacharactersEnter your regex: cat.Enter input string to search: catsFound ‘cats’ at index 0, ending at index 4.
  13. 13. Character ClassesConstruct Description[abc] a, b, or c (simple class) Any character except a, b, or c[^abc] (negation) a through z, or A through Z, inclusive[a-zA-Z] (range) a through d, OR m through p: [a-dm-p][a-d[m-p]] (union)[a-z&&[def]] d, e, f (intersection) a through z, except for b and c: [ad-z][a-z&&[^bc]] (subtraction) a through z, and not m through p: [a-lq-[a-z&&[^m-p]] z] (subtraction)
  14. 14. Character ClassEnter your regex: [bcr]atEnter input string to search: ratI found the text "rat" starting at index 0 andending at index 3.Enter input string to search: catFound "cat" at index 0, ending at index 3.
  15. 15. Character Class: NegationEnter your regex: [^bcr]atEnter input string to search: ratNo match found.Enter input string to search: hatFound "hat" at index 0, ending at index 3.
  16. 16. Character Class: RangeEnter your regex: foo[1-5]Enter input string to search: foo5Found "foo5" at index 0, ending at index 4.Enter input string to search: foo6No match found.
  17. 17. Character Class: UnionEnter your regex: [0-4[6-8]]Enter input string to search: 0Found "0" at index 0, ending at index 1.Enter input string to search: 5No match found.Enter input string to search: 6Found "6" starting at index 0, ending at index 1.
  18. 18. Character Class: IntersectionEnter your regex: [0-9&&[345]]Enter input string to search: 5Found "5" at index 0, ending at index 1.Enter input string to search: 2No match found.
  19. 19. Character Class: SubtractionEnter your regex: [0-9&&[^345]]Enter input string to search: 5No match found.
  20. 20. Predefined Character ClassesConstruct Description Any character (may or may not match line. terminators)d A digit: [0-9]D A non-digit: [^0-9]s A whitespace character: [ tnx0Bfr]S A non-whitespace character: [^s]w A word character: [a-zA-Z_0-9]W A non-word character: [^w]
  21. 21. Predefined Character Classes (cont.)• To summarize: – d matches all digits – s matches spaces – w matches word characters• Whereas a capital letter is the opposite: – D matches non-digits – S matches non-spaces – W matches non-word characters
  22. 22. QuantifiersGreedy Reluctant Possessive MeaningX? X?? X?+ X, once or not at all X, zero or moreX* X*? X*+ times X, one or moreX+ X+? X++ timesX{n} X{n}? X{n}+ X, exactly n timesX{n,} X{n,}? X{n,}+ X, at least n times X, at least n but notX{n,m} X{n,m}? X{n,m}+ more than m times
  23. 23. Ignore Greedy, Reluctant, and Possessive For now.
  24. 24. Zero Length Match• In the regexes ‘a?’ and ‘a*’ each allow for zero occurrences of the letter a.Enter your regex: a*Enter input string to search: aaFound “aa" at index 0, ending at index 2.Found “” at index 2, ending at index 2.
  25. 25. Quatifiers: ExactEnter your regex: a{3}Enter input string to search: aaNo match found.Enter input string to search: aaaaFound "aaa" at index 0, ending at index 3.
  26. 26. Quantifiers: At Least, No GreaterEnter your regex: a{3,}Enter input string to search: aaaaaaaaaFound "aaaaaaaaa" at index 0, ending at index 9.Enter your regex: a{3,6}Enter input string to search: aaaaaaaaaFound "aaaaaa" at index 0, ending at index 6.Found "aaa" at index 6, ending at index 9.
  27. 27. Quantifiers• "abc+" – Means "a, followed by b, followed by (c one or more times)". – “abcc” = match!, “abbc” = no match• “*abc++” – Means “(a, b, or c) one or more times) – “bba” = match!
  28. 28. Greedy, Reluctant, and Possessive• Greedy – The whole input is validated, end characters are consecutively left off as needed• Reluctant – No input is validated, beginning characters are consecutively added as needed• Possessive – The whole input is validated, no retries are made
  29. 29. GreedyEnter your regex: .*fooEnter input string to search: xfooxxxxxxfooFound "xfooxxxxxxfoo" at index 0, ending atindex 13.
  30. 30. ReluctantEnter your regex: .*?fooEnter input string to search: xfooxxxxxxfooFound "xfoo" at index 0, ending at index 4.Found "xxxxxxfoo" at index 4, ending at index13.
  31. 31. PossessiveEnter your regex: .*+fooEnter input string to search: xfooxxxxxxfooNo match found.
  32. 32. Capturing Group• Capturing groups are a way to treat multiple characters as a single unit.• They are created by placing the characters to be grouped inside a set of parentheses.• “(dog)” – Means a single group containing the letters "d" "o" and "g".
  33. 33. Capturing Group w/ Quantifiers• (abc)+ – Means "abc" one or more times
  34. 34. Capturing Groups: Numbering• ((A)(B(C))) 1. ((A)(B(C))) 2. (A) 3. (B(C)) 4. (C)• The index is based on the opening parentheses.
  35. 35. Capturing Groups: Numbering Usage• Some Matcher methods accept a group number as a parameter:• int start(int group)• int end (int group)• String group (int group)
  36. 36. Capturing Groups: Backreferences• The section of input matching the capturing group is saved for recall via backreference.• Specify a backreference with ‘’ followed by the group number.• ’(dd)’ – Can be recalled with the expression ‘1’.
  37. 37. Capturing Groups: BackreferencesEnter your regex: (dd)1Enter input string to search: 1212Found "1212" at index 0, ending at index 4.Enter input string to search: 1234No match found.
  38. 38. Boundary MatchersBoundary Construct Description^ The beginning of a line$ The end of a lineb A word boundaryB A non-word boundaryA The beginning of the inputG The end of the previous match The end of the input but for the finalZ terminator, if anyz The end of the input
  39. 39. Boundary MatchersEnter your regex: ^dog$Enter input string to search: dogFound "dog" at index 0, ending at index 3.Enter your regex: ^dogw*Enter input string to search: dogblahblahFound "dogblahblah" at index 0, ending at index11.
  40. 40. Boundary Matchers (cont.)Enter your regex: bdogbEnter input string to search: The doggieplays in the yard.No match found.Enter your regex: GdogEnter input string to search: dog dogFound "dog" at index 0, ending at index 3.
  41. 41. Pattern Class (cont.)• There are a number of flags that can be passed to the ‘compile’ method.• Embeddable flag expressions are Java-specific regex that duplicates these compile flags.• Check out ‘matches’, ‘split’, and ‘quote’ methods as well.
  42. 42. Matcher Class (cont.)• The Matcher class can slice input a multitude of ways: – Index methods give the position of matches – Study methods give boolean results to queries – Replacement methods let you edit input
  43. 43. PatternSyntaxException (cont.)• You get a little more than just an error message from the PatternSyntaxException.• Check out the following methods: – public String getDescription() – public int getIndex() – public String getPattern() – public String getMessage()
  44. 44. The End$

×