• Like
  • Save
Regular expressions
Upcoming SlideShare
Loading in...5
×
 

Regular expressions

on

  • 130 views

 

Statistics

Views

Total Views
130
Views on SlideShare
130
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Feedback: <br /> - Move flags slide to before walkthrough <br /> - How about an ending slide? <br />
  • Kleene, Stephen C. (1956). "Representation of Events in Nerve Nets and Finite Automata" <br />
  • Character literals <br />
  • Multiple character literals <br />
  • The wildcard <br />
  • What if you actually want a period? Escape it! The same goes for any special character. <br />
  • You can combine wildcards with literals <br />
  • If you put something in brackets, it means ‘anything in the brackets’ <br />
  • ..and you can do ranges by adding a hyphen <br />
  • Note that when using big ranges like A-z, that you are using ASCII ordering, so you’ll probably be including some things you don’t expect. <br />
  • ..or multiple ranges <br />
  • Brackets with a caret negates the match.. so here, select anything except for capital letters. <br />
  • Common character classes have shortcuts, like ‘d’ for digits. <br />
  • ‘s’ for spaces (including tabs, spaces and newlines) <br />
  • ‘w’ for WORD characters (including a-z, 0-9 and _)! You can remember them by thinking of your favorite shoe marketplace, DSW! <br />
  • You can invert any of those by using an upper case version.. so here we are selecting non-word characters <br />
  • Quantifiers are a special syntax that dictates how many of a thing to match <br />
  • Question mark means zero or one of the preceding character. Here, read this is ‘me’ followed by an optional ‘n’ <br />
  • Asterisk aka glob, aka the ‘Kleene Star’ means zero or more of the preceding character. It is ridiculously powerful and hungry, like Sinistar. Here we are matching everything between quotes. But you can see a problem, that we’re also blowing past two other quotes on the way to the last one. This is called a greedy glob. <br />
  • We can limit the greed of the match by adding a question mark.. this will now stop matching as soon as the condition is met. <br />
  • BUT, we can also write it this way. TMTOWTDI <br />
  • + means one or more of the preceding character. Here we are matching t followed by one or more word characters, followed by a vowel. Note that we do not match ‘to’ <br />
  • And here you can see the difference between + and * <br />
  • To specify an exact number, you can put the number in curly braces! Here we are selecting all double vowels. <br />
  • You can specify a minimum and maximum.. here we are looking for 3 or 4 e’s to zone on the perfect level of excitement. <br />
  • If you leave off one of ends, it works as a greater than or less than. Here we are looking for squ followed by a minimum of 4 e’s <br />
  • If you leave off one of ends, it works as a greater than or less than. Here we are looking for squ followed by a minimum of 4 e’s <br />
  • Anchors constrain your match to a certain part of the string! <br />
  • Caret means, at the beginning of the string. Don’t confuse it with the caret inside of brackets which means to negate. Here we are selecting the first ‘word’ in the string. <br />
  • Dollar sign means at the end of the string and goes on the other end. <br />
  •  is the best one EVER. It means a word boundary (start or end of the string, surrounded by spaces or punctuation). Here we’re matching all words that start with ‘a’ - but don’t match ‘a’ in the middle of string. TODO: How to do this without word boundaries? <br />
  • You can use it on both ends of your regex. Here we’re finding all words that are exactly three characters long. <br />
  • Anchors constrain your match to a certain part of the string! <br />
  • If you put something in parenthesis, you’re putting it into a match group or and storing the parenthesized content for later in a numbered slot <br />
  • Why do you care about storing them? You can reference them again later in code. (TODO: Change to tag and use javascripts. Use rubular on a workspace. <br />
  • If you use a pipe character, you can provide several options in a match group <br />
  • If you want to group things without creating match groups, you can use a special syntax with a questionable colon. Provide example of nested matchgroup spaghetti and why you would want to use this. <br />
  • Chances are if you know perl, you shouldn’t be here.. <br />
  • One gotcha with Java is that it doesn’t contain a native regular expression delimiter so you have to double escape things. <br />

Regular expressions Regular expressions Presentation Transcript

  • Regular @ !#?@ ! Expressions 101
  • KNOW YOUR ENEMY
  • rubular.com
  • Part I Character Literals and Wildcards
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /m/ http://tinyurl.com/rx101-2
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /me/ http://tinyurl.com/rx101-4
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /./ http://tinyurl.com/rx101-5
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /./ http://tinyurl.com/rx101-8
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /.e/ http://tinyurl.com/rx101-10
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /[aeiou]/ http://tinyurl.com/rx101-11
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /[a-f]/ http://tinyurl.com/rx101-12
  • /[A-z]/?
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /[a-fw-z]/ http://tinyurl.com/rx101-13
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /[^A-Z]/ http://tinyurl.com/rx101-14
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /d/ http://tinyurl.com/rx101-15
  • "Now_is_the_time_for_all_good_men_to_come_to_the_aid_of_their_country. "_was_proposed_as_a_typing_drill_by_a_teacher_named_Charles_E._Weller _aka_"Chase"_in_1918. /s/ http://tinyurl.com/rx101-16
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /w/ http://tinyurl.com/rx101-17
  • "Now_is_the_time_for_all_good_men_to_come_to_the_aid_of_their_country. "_was_proposed_as_a_typing_drill_by_a_teacher_named_Charles_E._Weller _aka_"Chase"_in_1918. /W/ http://tinyurl.com/rx101-18
  • Activity! Write a regular expression to match the telephone number in this text: For a good time call Stanley: 555-1212 - you will not be disappoint. http://tinyurl.com/rx101-a1
  • Part II Quantifiers
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /men?/ http://tinyurl.com/rx101-19
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /".*"/ http://tinyurl.com/rx101-20
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /".*?"/ http://tinyurl.com/rx101-21
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. “TMTOWTDI” http://tinyurl.com/rx101-22 /"[^"]*"/
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /tw+[aeiou]/ http://tinyurl.com/rx101-23
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /tw*[aeiou]/ http://tinyurl.com/rx101-24
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /[aeiou]{2}/ http://tinyurl.com/rx101-25
  • squee squeee squeeee squeeeee squeeeeee! /sque{3,4}/ http://tinyurl.com/rx101-26
  • squee squeee squeeee squeeeee squeeeeee! /sque{4,}/ http://tinyurl.com/rx101-27
  • squee squeee squeeee squeeeee squeeeeee! /sque{,2}/ http://tinyurl.com/rx101-28
  • “TMTOWTDI” /men?/ = /men{0,1}/
  • “TMTOWTDI” /.*/ = /.{0,}/
  • “TMTOWTDI” /.+/ = /.{1,}/
  • Activity! Rewrite the telephone number regular expression to use quantifiers. For a good time call Stanley: 555-1212 - you will not be disappoint. http://tinyurl.com/rx101-a1 Extra credit - extract the email address. For a good time email stanley@aol.com you will not be disappoint. http://tinyurl.com/rx101-a2
  • Part III Anchors
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /^[^s]+/ http://tinyurl.com/rx101-29
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /[^s]+$/ http://tinyurl.com/rx101-30
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /baw*/ http://tinyurl.com/rx101-31
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /bw{3}b/ http://tinyurl.com/rx101-32
  • Activity! Go back and fix this example to find whole words that start with ‘t’ and end in a vowel. http://tinyurl.com/rx101-24
  • Part IV Match Groups
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. 1. Now is the time for all good men to come to the aid of their country. 2. Chase /"(.*?)"/ http://tinyurl.com/rx101-33
  • Replacement str = '"Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918.'puts str.gsub(/"(.*?)"/, '<i>1</i>') str = '"Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918.'puts str.gsub(/"(.*?)"/, '<i>1</i>') <i>Now is the time for all good men to come to the aid of their country.</i> was proposed as a typing drill by a teacher named Charles E. Weller aka <i>Chase</i> in 1918.
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. 1. is 2. the 3. the 4. of /b(the|is|of)b/ http://tinyurl.com/rx101-34
  • "Now is the time for all good men to come to the aid of their country." was proposed as a typing drill by a teacher named Charles E. Weller aka "Chase" in 1918. /b(?:the|is|of)b/ http://tinyurl.com/rx101-35
  • Walkthrough! http://tinyurl.com/rx101-a3
  • Part V Flags
  • Perl / PCRE Style delimiters /cat/gi flags regex
  • Java Style Java delimiters glue Pattern.compile(“cat”, CASE_INSENSITIVE | DOTALL); regex flags
  • i CASE_INSENSITIVE s DOTALL m MULTILINE g
  • Part VI How do I use my powers?
  • grep aka cheat at Letterpress!
  • Your Text Editor!
  • $str = 'three cats chased five mice';$str =~ /(cats|mice)/;print $1;$str =~ s/ (cats|mice)/fuzzy $1/g;print $str; cats three fuzzy cats chased five fuzzy mice
  • Script var str = 'three cats chased five mice';var matches = str.match(/(cats| mice)/);console.log( matches[1] );str.replace(/(cats|mice)/g, ‘fuzzy $1’);console.log( str ); cats three fuzzy cats chased five fuzzy mice
  • str = 'three cats chased five mice'matches = str.match(/(cats|mice)/)puts matches[1]str.gsub!(/(cats|mice)/, ‘fuzzy 1’)puts str cats three fuzzy cats chased five fuzzy mice
  • String str = “three cats chased five mice”; Pattern p = Pattern.compile(“(cats|mice)”); Matcher m = p.matcher( str );System.out.println( m.group(1) );String newStr = str.replaceAll(“(cats|mice)”, “fuzzy $1”);System.out.println( str ); cats three fuzzy cats chased five fuzzy mice
  • String str = “three cats chased five mice”; Pattern p = Pattern.compile(“bw{4}b”); Matcher m = p.matcher( str );System.out.println( m.group(1) );
  • String str = “three cats chased five mice”; Pattern p = Pattern.compile(“bw{4}b”); Matcher m = p.matcher( str );System.out.println( m.group(1) ); No.
  • String str = “three cats chased five mice”; Pattern p = Pattern.compile(“bw{4}b”); Matcher m = p.matcher( str );System.out.println( m.group(1) );
  • http://www.regexplanet.com/advanced/java/index.html