• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Regular expressions
 

Regular expressions

on

  • 536 views

 

Statistics

Views

Total Views
536
Views on SlideShare
536
Embed Views
0

Actions

Likes
0
Downloads
24
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Regular expressions Regular expressions Presentation Transcript

    • /Regular Expressions/ In Java
    • Credits• The Java Tutorials: Regular Expressions• docs.oracle.com/javase/tutorial /essential/regex/
    • Regex• Regular expressions are a way to describe a set of strings based on common characteristics shared by each string in the set.• They can be used to search, edit, or manipulate text and data.• They are created with a specific syntax.
    • Regex in Java• Regex in Java is similar to Perl• The java.util.regex package primarily consists of three classes: Pattern, Matcher, and PatternSyntaxException.
    • Pattern & PatternSyntaxException• You can think of this as the regular expression wrapper object.• You get a Pattern by calling: – Pattern.compile(“RegularExpressionString”);• If your “RegularExpressionString” is invalid, you will get the PatternSyntaxException.
    • Matcher• You can think of this as the search result object.• You can get a matcher object by calling: – myPattern.matcher(“StringToBeSearched”);• You use it by calling: – myMatcher.find()• Then call any number of methods on myMatcher to see attributes of the result.
    • Regex Test Harness• The tutorials give a test harness that uses the Console class. It doesn’t work in any IDE.• So I rewrote it to use Basic I/O
    • It’s time for…CODE DEMO
    • Regex• Test harness output example.• Input is given in Bold.Enter your regex: fooEnter input string to search: foofooFound ‘foo’ at index 0, ending at index 3.Found ‘foo’ at index 3, ending at index 6.
    • Indexing
    • Metacharacters• <([{^-=$!|]})?*+.>• Precede a metacharacter with a ‘’ to treat it as a ordinary character.• Or use Q and E to begin and end a literal quote.
    • MetacharactersEnter your regex: cat.Enter input string to search: catsFound ‘cats’ at index 0, ending at index 4.
    • Character ClassesConstruct Description[abc] a, b, or c (simple class) Any character except a, b, or c[^abc] (negation) a through z, or A through Z, inclusive[a-zA-Z] (range) a through d, OR m through p: [a-dm-p][a-d[m-p]] (union)[a-z&&[def]] d, e, f (intersection) a through z, except for b and c: [ad-z][a-z&&[^bc]] (subtraction) a through z, and not m through p: [a-lq-[a-z&&[^m-p]] z] (subtraction)
    • Character ClassEnter your regex: [bcr]atEnter input string to search: ratI found the text "rat" starting at index 0 andending at index 3.Enter input string to search: catFound "cat" at index 0, ending at index 3.
    • Character Class: NegationEnter your regex: [^bcr]atEnter input string to search: ratNo match found.Enter input string to search: hatFound "hat" at index 0, ending at index 3.
    • Character Class: RangeEnter your regex: foo[1-5]Enter input string to search: foo5Found "foo5" at index 0, ending at index 4.Enter input string to search: foo6No match found.
    • Character Class: UnionEnter your regex: [0-4[6-8]]Enter input string to search: 0Found "0" at index 0, ending at index 1.Enter input string to search: 5No match found.Enter input string to search: 6Found "6" starting at index 0, ending at index 1.
    • Character Class: IntersectionEnter your regex: [0-9&&[345]]Enter input string to search: 5Found "5" at index 0, ending at index 1.Enter input string to search: 2No match found.
    • Character Class: SubtractionEnter your regex: [0-9&&[^345]]Enter input string to search: 5No match found.
    • Predefined Character ClassesConstruct Description Any character (may or may not match line. terminators)d A digit: [0-9]D A non-digit: [^0-9]s A whitespace character: [ tnx0Bfr]S A non-whitespace character: [^s]w A word character: [a-zA-Z_0-9]W A non-word character: [^w]
    • Predefined Character Classes (cont.)• To summarize: – d matches all digits – s matches spaces – w matches word characters• Whereas a capital letter is the opposite: – D matches non-digits – S matches non-spaces – W matches non-word characters
    • QuantifiersGreedy Reluctant Possessive MeaningX? X?? X?+ X, once or not at all X, zero or moreX* X*? X*+ times X, one or moreX+ X+? X++ timesX{n} X{n}? X{n}+ X, exactly n timesX{n,} X{n,}? X{n,}+ X, at least n times X, at least n but notX{n,m} X{n,m}? X{n,m}+ more than m times
    • Ignore Greedy, Reluctant, and Possessive For now.
    • Zero Length Match• In the regexes ‘a?’ and ‘a*’ each allow for zero occurrences of the letter a.Enter your regex: a*Enter input string to search: aaFound “aa" at index 0, ending at index 2.Found “” at index 2, ending at index 2.
    • Quatifiers: ExactEnter your regex: a{3}Enter input string to search: aaNo match found.Enter input string to search: aaaaFound "aaa" at index 0, ending at index 3.
    • Quantifiers: At Least, No GreaterEnter your regex: a{3,}Enter input string to search: aaaaaaaaaFound "aaaaaaaaa" at index 0, ending at index 9.Enter your regex: a{3,6}Enter input string to search: aaaaaaaaaFound "aaaaaa" at index 0, ending at index 6.Found "aaa" at index 6, ending at index 9.
    • Quantifiers• "abc+" – Means "a, followed by b, followed by (c one or more times)". – “abcc” = match!, “abbc” = no match• “*abc++” – Means “(a, b, or c) one or more times) – “bba” = match!
    • Greedy, Reluctant, and Possessive• Greedy – The whole input is validated, end characters are consecutively left off as needed• Reluctant – No input is validated, beginning characters are consecutively added as needed• Possessive – The whole input is validated, no retries are made
    • GreedyEnter your regex: .*fooEnter input string to search: xfooxxxxxxfooFound "xfooxxxxxxfoo" at index 0, ending atindex 13.
    • ReluctantEnter your regex: .*?fooEnter input string to search: xfooxxxxxxfooFound "xfoo" at index 0, ending at index 4.Found "xxxxxxfoo" at index 4, ending at index13.
    • PossessiveEnter your regex: .*+fooEnter input string to search: xfooxxxxxxfooNo match found.
    • Capturing Group• Capturing groups are a way to treat multiple characters as a single unit.• They are created by placing the characters to be grouped inside a set of parentheses.• “(dog)” – Means a single group containing the letters "d" "o" and "g".
    • Capturing Group w/ Quantifiers• (abc)+ – Means "abc" one or more times
    • Capturing Groups: Numbering• ((A)(B(C))) 1. ((A)(B(C))) 2. (A) 3. (B(C)) 4. (C)• The index is based on the opening parentheses.
    • Capturing Groups: Numbering Usage• Some Matcher methods accept a group number as a parameter:• int start(int group)• int end (int group)• String group (int group)
    • Capturing Groups: Backreferences• The section of input matching the capturing group is saved for recall via backreference.• Specify a backreference with ‘’ followed by the group number.• ’(dd)’ – Can be recalled with the expression ‘1’.
    • Capturing Groups: BackreferencesEnter your regex: (dd)1Enter input string to search: 1212Found "1212" at index 0, ending at index 4.Enter input string to search: 1234No match found.
    • Boundary MatchersBoundary Construct Description^ The beginning of a line$ The end of a lineb A word boundaryB A non-word boundaryA The beginning of the inputG The end of the previous match The end of the input but for the finalZ terminator, if anyz The end of the input
    • Boundary MatchersEnter your regex: ^dog$Enter input string to search: dogFound "dog" at index 0, ending at index 3.Enter your regex: ^dogw*Enter input string to search: dogblahblahFound "dogblahblah" at index 0, ending at index11.
    • Boundary Matchers (cont.)Enter your regex: bdogbEnter input string to search: The doggieplays in the yard.No match found.Enter your regex: GdogEnter input string to search: dog dogFound "dog" at index 0, ending at index 3.
    • Pattern Class (cont.)• There are a number of flags that can be passed to the ‘compile’ method.• Embeddable flag expressions are Java-specific regex that duplicates these compile flags.• Check out ‘matches’, ‘split’, and ‘quote’ methods as well.
    • Matcher Class (cont.)• The Matcher class can slice input a multitude of ways: – Index methods give the position of matches – Study methods give boolean results to queries – Replacement methods let you edit input
    • PatternSyntaxException (cont.)• You get a little more than just an error message from the PatternSyntaxException.• Check out the following methods: – public String getDescription() – public int getIndex() – public String getPattern() – public String getMessage()
    • The End$